SmarterDx

Improving clinical and financial outcomes with physician-validated AI for documentation and coding.

Staff Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

2 days ago

Salary

$230K - $250K / year

Bachelor Degree10 yrs expEnglishAWSCloudDistributed SystemsKubernetesTerraform

Job Description

• Define and evolve reliability standards for the SmarterDx platform, including SLIs, SLOs, and error budgets that align engineering work with customer impact. • Implement a “reliability” platform using Terraform and infrastructure-as-code best practices. • Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR). • Lead incident response, drive blameless postmortems, and implement systemic improvements to prevent recurrence. • Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms. • Provide production support for the SmarterDx platform, applying SRE principles to ensure availability, performance, and data durability. • Research,prototype, and advocate for new reliability practices, tooling, and architectural improvements across the engineering organization.

Job Requirements

  • 10+ years of software and software reliability engineering experience, with significant time spent operating and scaling distributed systems in production environments.
  • 3+ years of hands-on experience running cloud-native infrastructure in AWS, including deep familiarity with containers, Kubernetes, monitoring, and alerting in live production systems.
  • Proven experience defining and managing SLIs/SLOs, leading incident response, and driving postmortems and systemic reliability improvements.
  • Strong expertise with Terraform and infrastructure-as-code practices for managing production infrastructure safely and reproducibly.
  • Deep experience with Kubernetes architecture and operations, including workload reliability, cluster scaling, networking, and failure modes.
  • Experience working in security-conscious, compliance-oriented environments where reliability and data protection are first-class concerns.
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field — or equivalent practical experience operating large-scale systems.

Benefits

  • Medical, Dental & Vision – Comprehensive plans with leading insurance providers, covering 75% of your premiums, depending on the plan.
  • Paid Parental Leave – Generous paid leave to support families through birth or adoption: Up to 12 weeks for parents.
  • Remote-First Team – Work from anywhere in the U.S.
  • Unlimited PTO & 10 Holidays – So you can relax and recharge.
  • 401(k) with Traditional & Roth Options – Tax-advantaged retirement savings through Fidelity with a 4% match.
  • Minimal Bureaucracy – A fast-moving, high-impact environment where you can focus on what matters.
  • Incredible Teammates! – Work alongside smart, supportive, and mission-driven colleagues.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemoteTeam 201-500

The DevSecOps Engineer will design, implement, and maintain DevSecOps CI/CD pipelines for secure, automated software delivery, integrating automated testing prior to deployment authorization. Responsibilities also include applying DoD STIG requirements, implementing secure coding practices, conducting security scans, and supporting application migration to compliant Cloud Service Providers.

United States
$145K - $150K / year
DevOps Engineer2 days ago
Full TimeRemoteTeam 201-500Since 2017

DevSecOps Engineer to enhance security practices in DoD systems

ApacheAWSCloudJenkinsOracleSQL
United States
$145K - $150K / year

Senior DevOps Engineer – Bazel

Workstate

We believe that every great idea deserves to become reality.

DevOps Engineer2 days ago
ContractRemoteTeam 51-200Since 2003H1B No Sponsor

DevOps Engineer with experience in Bazel deployment pipelines at Workstate

AWSJenkinsKubernetesPythonTerraform
United States

Staff Site Reliability Engineer

SmarterDx

Improving clinical and financial outcomes with physician-validated AI for documentation and coding.

DevOps Engineer2 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

We are seeking a Staff Site Reliability Engineer (SRE) to lead the reliability, scalability, and operational excellence of our production systems. This role is responsible for defining and driving SRE practices across the organization, including: SLIs/SLOs Incident management Cap...

United States
$230K - $250K / year