HavocAI

Autonomous Solutions for Maritime Operations

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 11-50Since 2024H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

2 days ago

Salary

$150K - $185K / year

Bachelor Degree7 yrs expEnglishCloudDistributed SystemsKubernetesLinuxPythonGo

Job Description

• Design and evolve reliability architecture for distributed and cloud-hosted systems. • Define and implement SRE best practices, including SLIs, SLOs, error budgets, and capacity planning. • Partner with platform and application teams to design systems for reliability, scalability, and operability. • Identify and mitigate systemic reliability risks across infrastructure and services. • Lead incident response processes including on-call rotations, escalation, and post-incident reviews. • Conduct root cause analysis for complex production incidents and drive long-term improvements. • Improve operational readiness through runbooks, automation, and resilience testing. • Reduce operational toil through tooling, automation, and process improvements. • Design and maintain observability systems for metrics, logging, tracing, and alerting. • Ensure services and data pipelines are observable, debuggable, and performant in production. • Drive performance analysis and tuning across infrastructure and service layers. • Build automation to improve system reliability, deployment safety, and recovery processes. • Partner with DevOps and Cloud Platform teams on CI/CD reliability, rollout strategies, and safe deployment patterns. • Support and improve Kubernetes-based environments and containerized workloads. • Collaborate with security teams to ensure secure and resilient system design. • Participate in disaster recovery planning and testing. • Maintain strong operational practices around access control, secrets management, and change management.

Job Requirements

  • 7+ years of experience in SRE, infrastructure, or systems engineering roles
  • Strong experience operating large-scale distributed production systems
  • Deep understanding of Linux systems, networking, and distributed systems fundamentals
  • Hands-on experience with Kubernetes and container orchestration
  • Programming or scripting experience in Go, Python, or similar languages
  • Experience designing and operating observability systems for production environments
  • Proven ability to lead incident response and reliability improvements
  • Strong communication skills and ability to collaborate across engineering teams
  • Must be a US Citizen.
  • Must be Eligible to obtain a Government Clearance - if required.

Benefits

  • 100% Employer paid Health, Dental and Vision Insurance for you and your families
  • Life Insurance (Employer Paid)
  • Ability to participate in the companies 401k program (Matching)
  • Unlimited PTO policy with an enforced 2 week minimum
  • Equity Package
  • Work / Home Office Stipend
  • Global Entry
  • 16 Week Paid Parental Leave
  • Monthly Health and Wellness Stipend

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer2 days ago
Full TimeRemoteTeam 51-200

Collate is the creator of the fast-growing open-source OpenMetadata project, and we’re passionate about transforming the way data teams work together. Our mission is to help every company realize the fullest potential of data through AI Agents via open-source, and unified metadat...

KubernetesDockerCI/CDInfrastructure as CodeDevSecOpsAWSECSJavaPythonTypeScriptNode.jsLoad BalancersWeb ServersCachingQueuing Systems
United States

Staff Site Reliability Engineer

Dave

We started Dave for one reason: banks weren’t built for people like us, and we knew we deserved better.

DevOps Engineer2 days ago
Full TimeRemoteTeam 201-500H1B Sponsor

Lead Site Reliability Engineering for GCP at a fintech company

CloudDNSGoogle Cloud PlatformJavaScriptKubernetesMySQLPythonRedisSQLTerraformTypeScriptGo
United States
$208K - $330K / year

Senior Software Engineer II - CI Pipeline Engineer

Aledade

Aledade, a public benefit corporation, exists to empower the most transformational part of our health care landscape - independent primary care. We were founded in 2014, and since then, we've become the largest network of independent primary care in the country - helping practices, health centers and clinics deliver better care to their patients and thrive in value-based care. Additionally, by creating value-based contracts across a wide variety of health plans, we aim to flip the script on the traditional fee-for-service model. Our work strengthens continuity of care, aligns incentives and ensures primary care physicians are paid for what they do best - keeping patients healthy. If you want to help create a health care system that is good for patients, good for practices and good for society - and if you're eager to join a collaborative, inclusive and remote-first culture - you've come to the right place.

DevOps Engineer2 days ago
Full TimeRemoteTeam 1,001-5,000

The engineer will architect the CI/CD vision, leading the evolution of a 'Universal Pipeline' to ensure HIPAA compliance by default through automation and guardrails. Responsibilities also include contributing to long-term strategy for developer experience, test tooling infrastructure, and self-service tooling.

PythonGoBashTerraformPulumiKubernetesDockerGitHub ActionsAWSInfrastructure as CodeCI/CDHIPAA ComplianceSecurity as CodeObservability
United States

Senior DevOps – Site Reliability Engineer

nDeavour Consulting

We are a staffing and IT recruitment company based in Sofia, Bulgaria.

DevOps Engineer2 days ago
Full TimeRemoteTeam 1-10Since 2019H1B No Sponsor

Senior DevOps Engineer managing AWS infrastructure for Mobile Wave Solutions

AWSCloudTerraform
United States