Senior ML Platform Engineer

Platform EngineerPlatform EngineerFull TimeRemoteTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 3 moreAll locations: California, Colorado, North Carolina, Massachusetts

Posted

19 days ago

Salary

$152K - $287.5K / year

Bachelor Degree5 yrs expEnglishAnsibleCloudDockerKubernetesLinuxPythonTerraformGo

Job Description

• Design, build, and maintain our core ML platform infrastructure as code, primarily using Ansible and Terraform, ensuring reproducibility and scalability across large-scale, distributed GPU clusters. • Apply SRE principles to diagnose, troubleshoot, and resolve complex system issues across the entire stack, ensuring high availability and performance for critical AI workloads. • Develop robust internal automation and tooling for ML workflow orchestration, resource scheduling, and platform operations, with a strong focus on software engineering best practices. • Collaborate with ML researchers and applied scientists to understand infrastructure needs and build solutions that streamline their end-to-end experimentation. • Evolve and operate our multi-cloud and hybrid (on-prem + cloud) environments, implementing monitoring, alerting, and incident response protocols. • Participate in on-call rotation to provide support for platform services and infrastructure running critical ML jobs, driving root cause analysis and implementing preventative measures. • Write high-quality, maintainable code (Python, Go) to contribute to the core orchestration platform and automate manual processes. • Drive the adoption of modern GPU technologies and ensure smooth integration of next-generation hardware into ML pipelines (e.g., GB200, NVLink, etc.).

Job Requirements

  • BS/MS in Computer Science, Engineering, or equivalent experience.
  • 5+ years in software/platform engineering or SRE roles, including 3+ years focused on ML infrastructure or distributed compute systems.
  • Strong proficiency in Infrastructure-as-Code (IaC) tools, specifically Ansible and Terraform, with a proven track record of building and managing production infrastructure.
  • SRE-oriented mindset with extensive experience in diagnosing system-level issues, performance tuning, and ensuring platform reliability.
  • Solid understanding of ML workflows and lifecycle—from data preprocessing to deployment.
  • Proficiency in operating containerized workloads with Kubernetes and Docker.
  • Strong software engineering skills in languages such as Python or Go, with a focus on automation, tooling, and writing production-grade code.
  • Experience with Linux systems internals, networking, and performance tuning at scale.

Benefits

  • equity
  • benefits

Related Categories

Related Job Pages

More Platform Engineer Jobs

PowerBI Consultant - Dallas

Tiger Analytics

AI & Analytics for today’s business challenges.

Platform Engineer19 days ago
Full TimeRemoteTeam 1,001-5,000Since 2011H1B Sponsor

Tiger Analytics is an advanced analytics consulting firm. We are the trusted analytics partner for several Fortune 1000 companies, enabling them to generate business value from data. Our consultants bring deep expertise in Data Science, Machine Learnin...

Texas

Platform Engineer – AI/ML Infrastructure

Deepgram

Building foundational AI for speech transcription and understanding.

Platform Engineer20 days ago
Full TimeRemoteTeam 51-200Since 2015H1B Sponsor

Lead the architecture and management of AI/ML infrastructure using Kubernetes and Terraform, optimizing hybrid environments for performance and scalability.

AWSBashGoKubernetesPythonSlurmTerraform
United States
$160K - $220K / year

Data Platform Engineer

Vector Health

Vector Health is a platform connecting healthcare teams, nonprofits, and patients in need of financial aid. We automatically identify, match, and apply for medical mutual aid grants in real-time, and retrospectively, from thousands of nonprofits accessing $30B annually of medical mutual aid, transforming a health center's revenue cycle.

Platform Engineer20 days ago
Full TimeRemoteTeam 8Since 2025

The Data Platform Engineer will build and maintain data pipelines, ensure compliance, develop analytics, and collaborate on a full stack product with a focus on healthcare data.

AWSGitLaravelPythonSQLVue
United States

Power Platform Developer

T-Rex Solutions, LLC

Relentlessly Driving Innovation

Platform Engineer21 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor

Microsoft Power Platform Developer supporting DHA customer's performance and decision-making

Azure
United States
$100K - $150K / year