Senior Site Reliability Engineer, AI Factory
Location
California
Posted
20 days ago
Salary
$176K - $333.5K / year
Job Description
Job Requirements
- BS or MS degree in Computer Engineering/Science, or related field (or equivalent experience) with 10+ overall years of meaningful work experience
- Experience managing GPU Fleets
- 10+ years of expertise in improving data center operations or critical infrastructure
- Expertise in BMS & Power management
- Background in working with Provisioning, Commissioning, and Config Management solutions
- Experience working with Packer and developing QCOW2 images
- Background in coordinating with remote hands
- Experience working with Datacenter Inventory Management Systems like Netbox, Nautilus, or others
- Proven track record of working with multiple teams to achieve operational excellence for an organization
- Experience driving reliability with robust processes, rapid field response, and recovery
Benefits
- equity
- benefits
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevSecOps Engineer building UAV Command & Control platform for Swarm Aero.
Senior Site Reliability Engineer, Hawaii
OnebriefSoftware for rapid military planning: make planning fast enough for today's environment
We are hiring a Senior Site Reliability Engineer to ensure deployment stability and service quality, working in on-premise DoD and AWS environments.
DevOps Engineer specializing in Salesforce delivery and CI/CD automation
DevOps Engineer | Arixa Capital
Ziphire HRWe connect talent to companies using our innovative platform.
Job Link: https://ziphire.hr/job/0c69bb74-12f7-44b5-9e87-98d3bf5f319c Arixa Capital is a leading private real estate lender and alternative investment manager with over $7 billion in originations completed since inception and a servicing portfolio exce...