Site Reliability Engineer
Location
United States
Posted
1 day ago
Salary
$104.0K - $127.2K / year
Job Description
Role Description
The Site Reliability Engineer builds and operates the paved roads that service teams use every day. You take shared infrastructure from idea to module to production, then you keep it boring. This is not a research role and not a hero role. It is delivery with discipline. You build with intention. You do not just make things work, you make them make sense. You challenge assumptions, question defaults, and tighten bolts others ignore. You move fast, but not recklessly. You are becoming the engineer others trust to take ownership and deliver cleanly.
This is a hands-on engineering role who can work independently on well-scoped problems with guidance, follow established patterns, and improve them when the evidence supports change. You partner closely with Security, Networking, and SRE because the platform is where constraints become real. As a Site Reliability Engineer, you help determine whether the platform feels chaotic or calm to everyone else. Your work directly affects developer velocity, operational safety, and trust in the system. When the platform is boring, predictable, and resilient, it is because engineers like you did the work carefully and well.
Core Responsibilities
-
Cloud Foundations
- Implement cloud infrastructure in AWS using approved patterns and guardrails.
- Support EKS based runtime foundations, including cluster add-ons and shared services.
- Build environment parity across nonprod and prod and flag any required divergence early with evidence.
- Help make cloud primitives predictable, supportable, and easy to consume.
-
Infrastructure Patterns and Modules
- Develop and maintain reusable platform modules and templates using Terraform or CDKTF where applicable.
- Contribute to baseline building blocks: VPC patterns, IAM primitives, EKS base clusters, ingress patterns, secrets, and shared data stores as assigned.
- Keep modules consumable through sane defaults, versioning, changelogs, and upgrade guidance.
- Reduce drift by enforcing standards through code, not documentation alone.
-
Automation and Delivery Enablement
- Improve CI workflows for infrastructure changes: plan and apply safety, policy checks, drift detection, and promotion across environments.
- Remove manual steps from provisioning and onboarding by turning them into pipelines and documented runbooks.
- Support internal module consumption patterns, including examples and reference implementations.
- Favor repeatability and clarity over clever one-off solutions.
-
Operations and Reliability
- Operate platform owned services with an ownership mindset. Ownership is not optional.
- Participate in on call for platform services and follow incident procedures.
- Write and maintain runbooks, dashboards, and alerts for what you ship.
- Drive post-incident follow-ups that reduce repeat failures.
-
Security, Compliance, and Governance
- Implement least privilege IAM patterns and secure by design defaults.
- Partner with Security to integrate controls into pipelines and platform defaults.
- Treat auditability as a feature: logs, approvals, traceability, and evidence.
- Follow established governance and exception processes and document deviations.
Qualifications
- 3 plus year's experience in platform engineering, DevOps, SRE, or infrastructure engineering.
- Working experience with AWS and infrastructure as code (Terraform preferred, CDKTF acceptable).
- Practical Kubernetes experience, preferably EKS (deploying, operating, debugging).
- Comfort with networking fundamentals: DNS, TLS, routing, load balancers, and security groups.
- Ability to debug pipelines and distributed failures without guessing.
- Strong written communication: design notes, runbooks, and crisp status updates.
Benefits
- Flexible Personal Time Off (Vacation time)
- 401K match
- Competitive healthcare, dental and vision insurance plans
- Paid Parental Leave (Maternity and Paternity leave)
- Employee Stock Purchase Program
- Free access to Amwell’s Telehealth Services, SilverCloud and The Clinic by Cleveland Clinic’s second opinion program
- Free Subscription to the Calm App
- Tuition Assistance Program
- Pet Insurance
Job Requirements
- 3 plus year's experience in platform engineering, DevOps, SRE, or infrastructure engineering.
- Working experience with AWS and infrastructure as code (Terraform preferred, CDKTF acceptable).
- Practical Kubernetes experience, preferably EKS (deploying, operating, debugging).
- Comfort with networking fundamentals: DNS, TLS, routing, load balancers, and security groups.
- Ability to debug pipelines and distributed failures without guessing.
- Strong written communication: design notes, runbooks, and crisp status updates.
Benefits
- Flexible Personal Time Off (Vacation time)
- 401K match
- Competitive healthcare, dental and vision insurance plans
- Paid Parental Leave (Maternity and Paternity leave)
- Employee Stock Purchase Program
- Free access to Amwell’s Telehealth Services, SilverCloud and The Clinic by Cleveland Clinic’s second opinion program
- Free Subscription to the Calm App
- Tuition Assistance Program
- Pet Insurance
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Staff Site Reliability Engineer
JobgetherWe use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1 We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
This is a senior, hands-on role within a small, high-leverage SRE team, responsible for ensuring the reliability, scalability, and security of a high-growth digital financial platform. The Staff SRE will architect, automate, and optimize cloud infrastructure, focusing on operatio...
Associate Reliability Engineer
ChompsProtein-packed meat snacks that deliver on taste, simple ingredients and powerful nutrition!
Reliability Engineer focused on asset maintenance for packaging equipment at Chomps
We are seeking a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, security, and performance of our production systems and services. The SRE will bridge the gap between software development and operations, implementing automation, monitoring, ...
The Senior Site Reliability Engineer acts as the Technical Architecture & Stability Assessment Lead, evaluating the reliability and resilience of complex enterprise infrastructure environments over a structured 16-week assessment period. This role focuses on identifying stability risks, mapping dependencies, and strengthening current architecture to ensure operational continuity during modernization efforts.