Building foundational AI for speech transcription and understanding.
Site Reliability Engineer – AI & ML Infrastructure, Kubernetes, Terraform
Location
United States
Posted
25 days ago
Salary
$160K - $220K / year
Job Description
Job Requirements
- 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE)
- Proven, hands-on experience building and managing production infrastructure with Terraform
- Expert-level knowledge of Kubernetes architecture and operations in a large-scale environment
- Experience with high-performance compute (HPC) job schedulers, specifically Slurm, for managing GPU-intensive AI workloads
- Experience managing bare metal infrastructure, including server provisioning (e.g., PXE boot, MAAS), configuration, and lifecycle management
- Strong scripting and automation skills (e.g., Python, Go, Bash)
Benefits
- Medical, dental, vision benefits
- Annual wellness stipend
- Mental health support
- Life, STD, LTD Income Insurance Plans
- Unlimited PTO
- Generous paid parental leave
- Flexible schedule
- 12 Paid US company holidays
- Quarterly personal productivity stipend
- One-time stipend for home office upgrades
- 401(k) plan with company match
- Tax Savings Programs
- Learning / Education stipend
- Participation in talks and conferences
- Employee Resource Groups
- AI enablement workshops / sessions
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior DevOps Software Engineer
eClinical SolutionsWe bring people and data together to support tomorrow’s breakthroughs
Senior DevOps Software Engineer at eClinical Solutions implementing AWS infrastructure
Senior Site Reliability Engineer
ZscalerWe make it easy to secure your cloud transformation. Get fast, secure, and direct access to apps without appliances.
Senior Site Reliability Engineer managing Zscaler's production cloud operations
Site Engineer
XYZ RealityTransforming how projects are planned, built, and delivered with the ultimate construction delivery platform.
Perform on-site layout, quality inspections and progress reporting using XYZ's Atom AR headset. Set/traverse control points with a total station, communicate with clients/contractors, produce inspection reports, process survey data, and prepare/export approved Revit models into HoloSite.
Senior Tech Lead for SRE team at Humana overseeing system reliability and performance.