Staff Software Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemote

Location

United States + 1 more

Posted

45 days ago

Salary

$195K - $245K / year

No structured requirement data.

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

We're looking for a Staff SRE who can own the reliability, scalability, and operational excellence of our platform. You'll work at the intersection of infrastructure and software engineering - building the systems, tooling, and practices that let our team ship confidently and operate at scale.

Set technical direction for infrastructure and reliability - evaluate approaches, make architectural decisions, and establish standards.
Own and evolve our Kubernetes-based infrastructure on GCP.
Build and maintain CI/CD pipelines, deployment tooling, and release processes.
Maintain and simplify our build system (Bazel) for faster, more reliable builds across the org.
Define and instrument SLIs/SLOs; build dashboards and alerting that surface real problems.
Drive incident response, post-mortems, and reliability improvements.
Partner with product engineers to design systems that are reliable and operable from day one.
Contribute to our engineering culture around AI-augmented development - sharing patterns, workflows, and lessons learned.

Qualifications

Significant experience in SRE, platform engineering, or infrastructure roles at scale.
Demonstrated technical leadership: you've driven significant infrastructure or reliability initiatives, not just executed on them.
Deep hands-on expertise with Kubernetes (GKE preferred) and GCP services.
Strong programming skills - Go preferred.
Experience with build systems (Bazel strongly preferred) and CI/CD tooling.
Practical experience with AI coding assistants as part of your regular workflow - not just experimentation, but daily use.
Ability to critically evaluate AI-generated code and infrastructure configs: you know when to trust it, when to revise it, and when to write it yourself.
Track record of improving reliability through automation, observability, and good engineering practices.
Comfort with ambiguity and ownership; we're a small team where engineers drive decisions.

Nice to Have

Background in security, malware analysis, or threat detection.
Experience with large-scale data systems (BigTable, Spanner, BigQuery).
Deep proficiency in Go.

Benefits

Hard technical problems with real security impact.
Small team, huge impact, high autonomy, low process overhead.
Opportunity to collaborate with world-class experts in cybersecurity.
Work remotely in the USA or Canada, or use our co-working space in Santa Clara to collaborate with teammates in-person.

Job Requirements

Significant experience in SRE, platform engineering, or infrastructure roles at scale.
Demonstrated technical leadership: you've driven significant infrastructure or reliability initiatives, not just executed on them.
Deep hands-on expertise with Kubernetes (GKE preferred) and GCP services.
Strong programming skills - Go preferred.
Experience with build systems (Bazel strongly preferred) and CI/CD tooling.
Practical experience with AI coding assistants as part of your regular workflow - not just experimentation, but daily use.
Ability to critically evaluate AI-generated code and infrastructure configs: you know when to trust it, when to revise it, and when to write it yourself.
Track record of improving reliability through automation, observability, and good engineering practices.
Comfort with ambiguity and ownership; we're a small team where engineers drive decisions.
Nice to Have
Background in security, malware analysis, or threat detection.
Experience with large-scale data systems (BigTable, Spanner, BigQuery).
Deep proficiency in Go.

Benefits

Hard technical problems with real security impact.
Small team, huge impact, high autonomy, low process overhead.
Opportunity to collaborate with world-class experts in cybersecurity.
Work remotely in the USA or Canada, or use our co-working space in Santa Clara to collaborate with teammates in-person.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More US Remote Jobs

More DevOps Engineer Jobs

Senior Site Reliability Engineer, Azure Red Hat OpenShift

Red Hat

The leading provider of enterprise open source solutions.

DevOps Engineer45 days ago

Full TimeRemoteTeam 10,001+Since 1993H1B Sponsor

Company Site LinkedIn

Senior Site Reliability Engineer managing OpenShift cloud services at Red Hat

AnsibleAWSAzureChefCloudDNSJavaLinuxPrometheusPuppetPythonTCP/IPGo

View details: Senior Site Reliability Engineer, Azure Red Hat OpenShift

California + 1 more

$139.6K - $230.2K / year

Apply

Senior Site Reliability Engineer

Unknown Company

DevOps Engineer45 days ago

Full TimeRemote

As a Senior Site Reliability Engineer, you'll enhance system performance, reliability, and cost efficiency in a large-scale production environment, shifting manual operations to AI-assisted engineering.

AnsibleDatadogElkGrafanaKubernetesLinuxPrometheusPythonRubyTerraform

View details: Senior Site Reliability Engineer

United States

Apply