FluidStack

NVIDIA H100 & A100 GPUs available on demand at scale. Access thousands of GPUs for AI/LLM/ML, ready for deployment now.

Director, Infrastructure

DirectorDirectorFull TimeRemoteTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

3 days ago

Salary

$250K - $350K / year

10 yrs expEnglishKubernetesOpen Source

Job Description

• Own the technical design, deployment, and operational reliability of Fluidstack's bare-metal clusters across all production sites, covering compute, storage, and networking infrastructure. • Lead the Infrastructure Engineering organization, comprising Networking Engineers, Compute Systems Engineers, and Storage Engineers, with high standards for technical depth, deployment velocity, and on-call reliability. • Drive cluster architecture decisions for current-generation GPU systems (NVIDIA, AMD, and other XPUs), including server configuration, frontend and backend fabric design, storage topology, and rack power and cooling envelope. • Coordinate with Supply Chain on OEM relationships, hardware specifications, and delivery timelines to ensure the physical infrastructure roadmap stays one step ahead of customer commitments. • Partner with Data Center Operations on new site bring-ups, ensuring smooth handoff from civil and MEP completion through network cabling, hardware racking, burn-in, and customer acceptance testing. • Work with Software Engineering and SRE to define infrastructure requirements for managed Kubernetes, SLURM, and inference serving, ensuring the physical layer meets the demands of the software stack. • Build and maintain deployment tooling, burn-in automation, and hardware lifecycle management systems that enable your team to operate at a pace and reliability level that sets Fluidstack apart. • Stay hands-on: participate in design reviews, be present for critical cluster bring-ups, and engage directly with complex infrastructure failures to maintain technical credibility with your team and across the organization. • Travel as needed to data centers, OEM facilities, customer sites, and industry events to stay close to the hardware, the partners, and the market. • Coordinate with Finance on infrastructure CapEx planning and cost modeling, with Security on hardening and compliance requirements, and with Sales on pre-sales technical diligence and capacity commitments to customers.

Job Requirements

  • 10+ years of infrastructure engineering experience, with at least 3 years in a technical leadership role managing a team of systems, networking, or storage engineers.
  • Demonstrated ownership of the design, deployment, and operation of a 10,000+ GPU cluster using a recent-generation accelerator (Blackwell, Hopper, or equivalent XPU), from physical hardware bring-up through production steady-state.
  • On-site, hands-on experience physically deploying hardware in data centers, with a clear sense of what it takes to execute a fast, reliable cluster bring-up.
  • Deep expertise in high-performance networking for AI workloads: InfiniBand (XDR/NDR) or RoCEv2 fabric design, large-scale BGP and ECMP architectures, and switch and cable plant management.
  • Strong working knowledge of GPU server hardware internals: NVLink and PCIe topology, NVMe configurations, BMC and firmware management.
  • Experience with high-performance parallel and distributed storage systems for AI training workloads, such as DDN/Lustre, WekaFS, VAST, and open source solutions.
  • Exceptional written and verbal communication skills, with the ability to translate between deep technical detail and high-level summaries for engineering, executive, and customer audiences.

Benefits

  • Competitive total compensation package (salary + equity).
  • Retirement or pension plan, in line with local norms.
  • Health, dental, and vision insurance.
  • Generous PTO policy, in line with local norms.

Related Categories

Related Job Pages

More Director Jobs

Full TimeRemoteTeam 1,001-5,000Since 1991H1B No Sponsor

Associate Director of Trade Relations for pharmaceutical industry at Sobi

Massachusetts
$150K - $200K / year
Full TimeRemoteTeam 1,001-5,000Since 1991H1B No Sponsor

Associate Director of Patient Services focusing on patient access programs at Sobi

Massachusetts
$150K - $190K / year

Government Affairs and Public Policy Director

Brightspeed

Bringing a fast, reliable internet service to homes and businesses across rural and suburban America.

Director3 days ago
Full TimeRemoteTeam 1,001-5,000Since 2022H1B No Sponsor

Government Affairs and Public Policy Director advancing Brightspeed's initiatives

Texas

Director of Cost Management

Turner & Townsend

A global consultancy business serving clients in the real estate, infrastructure and natural resources sectors.

Director3 days ago
Full TimeRemoteTeam 10,001+H1B No Sponsor

Director of Cost Management leading construction projects at Turner & Townsend

California
$170K - $205K / year