Staff Software Reliability Engineer
Location
United States + 1 moreAll locations: United States, Canada
Posted
45 days ago
Salary
$195K - $245K / year
No structured requirement data.
Job Description
Role Description
We're looking for a Staff SRE who can own the reliability, scalability, and operational excellence of our platform. You'll work at the intersection of infrastructure and software engineering - building the systems, tooling, and practices that let our team ship confidently and operate at scale.
- Set technical direction for infrastructure and reliability - evaluate approaches, make architectural decisions, and establish standards.
- Own and evolve our Kubernetes-based infrastructure on GCP.
- Build and maintain CI/CD pipelines, deployment tooling, and release processes.
- Maintain and simplify our build system (Bazel) for faster, more reliable builds across the org.
- Define and instrument SLIs/SLOs; build dashboards and alerting that surface real problems.
- Drive incident response, post-mortems, and reliability improvements.
- Partner with product engineers to design systems that are reliable and operable from day one.
- Contribute to our engineering culture around AI-augmented development - sharing patterns, workflows, and lessons learned.
Qualifications
- Significant experience in SRE, platform engineering, or infrastructure roles at scale.
- Demonstrated technical leadership: you've driven significant infrastructure or reliability initiatives, not just executed on them.
- Deep hands-on expertise with Kubernetes (GKE preferred) and GCP services.
- Strong programming skills - Go preferred.
- Experience with build systems (Bazel strongly preferred) and CI/CD tooling.
- Practical experience with AI coding assistants as part of your regular workflow - not just experimentation, but daily use.
- Ability to critically evaluate AI-generated code and infrastructure configs: you know when to trust it, when to revise it, and when to write it yourself.
- Track record of improving reliability through automation, observability, and good engineering practices.
- Comfort with ambiguity and ownership; we're a small team where engineers drive decisions.
Nice to Have
- Background in security, malware analysis, or threat detection.
- Experience with large-scale data systems (BigTable, Spanner, BigQuery).
- Deep proficiency in Go.
Benefits
- Hard technical problems with real security impact.
- Small team, huge impact, high autonomy, low process overhead.
- Opportunity to collaborate with world-class experts in cybersecurity.
- Work remotely in the USA or Canada, or use our co-working space in Santa Clara to collaborate with teammates in-person.
Job Requirements
- Significant experience in SRE, platform engineering, or infrastructure roles at scale.
- Demonstrated technical leadership: you've driven significant infrastructure or reliability initiatives, not just executed on them.
- Deep hands-on expertise with Kubernetes (GKE preferred) and GCP services.
- Strong programming skills - Go preferred.
- Experience with build systems (Bazel strongly preferred) and CI/CD tooling.
- Practical experience with AI coding assistants as part of your regular workflow - not just experimentation, but daily use.
- Ability to critically evaluate AI-generated code and infrastructure configs: you know when to trust it, when to revise it, and when to write it yourself.
- Track record of improving reliability through automation, observability, and good engineering practices.
- Comfort with ambiguity and ownership; we're a small team where engineers drive decisions.
- Nice to Have
- Background in security, malware analysis, or threat detection.
- Experience with large-scale data systems (BigTable, Spanner, BigQuery).
- Deep proficiency in Go.
Benefits
- Hard technical problems with real security impact.
- Small team, huge impact, high autonomy, low process overhead.
- Opportunity to collaborate with world-class experts in cybersecurity.
- Work remotely in the USA or Canada, or use our co-working space in Santa Clara to collaborate with teammates in-person.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Senior Site Reliability Engineer, Azure Red Hat OpenShift
Red HatThe leading provider of enterprise open source solutions.
Senior Site Reliability Engineer managing OpenShift cloud services at Red Hat
As a Senior Site Reliability Engineer, you'll enhance system performance, reliability, and cost efficiency in a large-scale production environment, shifting manual operations to AI-assisted engineering.
Senior DevOps Engineer supporting cloud-based AI platforms for a Big Four consulting firm
Staff Software Engineer – SAP BTP, CPI, SRE
NBCUniversalHere you can create the extraordinary. Join us.
Staff Software Engineer overseeing SAP BTP CPI applications for NBCUniversal