Staff Software Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemote

Location

United States + 1 moreAll locations: United States, Canada

Posted

45 days ago

Salary

$195K - $245K / year

No structured requirement data.

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

We're looking for a Staff SRE who can own the reliability, scalability, and operational excellence of our platform. You'll work at the intersection of infrastructure and software engineering - building the systems, tooling, and practices that let our team ship confidently and operate at scale.

  • Set technical direction for infrastructure and reliability - evaluate approaches, make architectural decisions, and establish standards.
  • Own and evolve our Kubernetes-based infrastructure on GCP.
  • Build and maintain CI/CD pipelines, deployment tooling, and release processes.
  • Maintain and simplify our build system (Bazel) for faster, more reliable builds across the org.
  • Define and instrument SLIs/SLOs; build dashboards and alerting that surface real problems.
  • Drive incident response, post-mortems, and reliability improvements.
  • Partner with product engineers to design systems that are reliable and operable from day one.
  • Contribute to our engineering culture around AI-augmented development - sharing patterns, workflows, and lessons learned.

Qualifications

  • Significant experience in SRE, platform engineering, or infrastructure roles at scale.
  • Demonstrated technical leadership: you've driven significant infrastructure or reliability initiatives, not just executed on them.
  • Deep hands-on expertise with Kubernetes (GKE preferred) and GCP services.
  • Strong programming skills - Go preferred.
  • Experience with build systems (Bazel strongly preferred) and CI/CD tooling.
  • Practical experience with AI coding assistants as part of your regular workflow - not just experimentation, but daily use.
  • Ability to critically evaluate AI-generated code and infrastructure configs: you know when to trust it, when to revise it, and when to write it yourself.
  • Track record of improving reliability through automation, observability, and good engineering practices.
  • Comfort with ambiguity and ownership; we're a small team where engineers drive decisions.

Nice to Have

  • Background in security, malware analysis, or threat detection.
  • Experience with large-scale data systems (BigTable, Spanner, BigQuery).
  • Deep proficiency in Go.

Benefits

  • Hard technical problems with real security impact.
  • Small team, huge impact, high autonomy, low process overhead.
  • Opportunity to collaborate with world-class experts in cybersecurity.
  • Work remotely in the USA or Canada, or use our co-working space in Santa Clara to collaborate with teammates in-person.

Job Requirements

  • Significant experience in SRE, platform engineering, or infrastructure roles at scale.
  • Demonstrated technical leadership: you've driven significant infrastructure or reliability initiatives, not just executed on them.
  • Deep hands-on expertise with Kubernetes (GKE preferred) and GCP services.
  • Strong programming skills - Go preferred.
  • Experience with build systems (Bazel strongly preferred) and CI/CD tooling.
  • Practical experience with AI coding assistants as part of your regular workflow - not just experimentation, but daily use.
  • Ability to critically evaluate AI-generated code and infrastructure configs: you know when to trust it, when to revise it, and when to write it yourself.
  • Track record of improving reliability through automation, observability, and good engineering practices.
  • Comfort with ambiguity and ownership; we're a small team where engineers drive decisions.
  • Nice to Have
  • Background in security, malware analysis, or threat detection.
  • Experience with large-scale data systems (BigTable, Spanner, BigQuery).
  • Deep proficiency in Go.

Benefits

  • Hard technical problems with real security impact.
  • Small team, huge impact, high autonomy, low process overhead.
  • Opportunity to collaborate with world-class experts in cybersecurity.
  • Work remotely in the USA or Canada, or use our co-working space in Santa Clara to collaborate with teammates in-person.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Site Reliability Engineer, Azure Red Hat OpenShift

Red Hat

The leading provider of enterprise open source solutions.

DevOps Engineer45 days ago
Full TimeRemoteTeam 10,001+Since 1993H1B Sponsor

Senior Site Reliability Engineer managing OpenShift cloud services at Red Hat

AnsibleAWSAzureChefCloudDNSJavaLinuxPrometheusPuppetPythonTCP/IPGo
California + 1 moreAll locations: California, Oregon
$139.6K - $230.2K / year
Full TimeRemote

As a Senior Site Reliability Engineer, you'll enhance system performance, reliability, and cost efficiency in a large-scale production environment, shifting manual operations to AI-assisted engineering.

AnsibleDatadogElkGrafanaKubernetesLinuxPrometheusPythonRubyTerraform
United States

Senior DevOps Engineer – Azure

Upstart 13

Bringing down borders in technology.

DevOps Engineer45 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

Senior DevOps Engineer supporting cloud-based AI platforms for a Big Four consulting firm

AzureCloudMicroservicesMongoDBReact.NET
United States

Staff Software Engineer – SAP BTP, CPI, SRE

NBCUniversal

Here you can create the extraordinary. Join us.

DevOps Engineer45 days ago
Full TimeRemoteTeam 10,001+Since 2004H1B Sponsor

Staff Software Engineer overseeing SAP BTP CPI applications for NBCUniversal

CloudSOAPGo
New York
$130K - $180K / year