CentralReach

Elevating Autism & IDD Care through Technology

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 201-500Since 2010Company SiteLinkedIn

Location

United States

Posted

1 day ago

Salary

$160K - $180K / year

AWSTerraformDockerKubernetesHelmCi/cdAnsibleChefSplunkNew RelicPrometheusGrafanaPythonGoJavaLinuxWindowsNetworkingCloud FormationSRE

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

As a Sr. SRE, you will work closely with the key stakeholders in Software Engineering to drive adoption of modern reliability practices like SLOs, error budget policies, actionable alerts, incident retrospectives, chaos testing, and end-to-end ownership.

  • Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning, setting and maintaining SLOs, SLIs and Error Budgets, creating dashboards.
  • Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's.
  • Manage site stability, performance, reliability, and maintain uptime for production environments.
  • Develop a fully automated multi-environment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns.
  • Strive for automation to reduce toil and increase development velocity.
  • Perform application-specific production support, incident management, change management, problem management, RCAs, and service restoration as needed.
  • Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.
  • Document resolution run books and standard operating procedures.
  • Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.
  • Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams.
  • Implementation of reliability and observability tools (like New Relic, Prometheus, Grafana etc.).
  • Collaborates with Security team and other platform engineering teams to build reliable, maintainable, and scalable solutions that improve our security posture.

Qualifications

  • Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
  • Solid experience with Monitoring/APM/Observability tools (Splunk, New Relic etc.).
  • Experience implementing observability plans around logs, metrics, and traces.
  • Experience in an agile development team developing software.
  • Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code (Terraform, CloudFormation).
  • Extensive experience with Docker, Kubernetes, Helm, CI/CD and config management tools like Ansible, Chef.
  • Strong experience with containerization technology and/or Kubernetes.
  • Experience with Release automation, system administration, configuration management.
  • Experience with programming languages (Java, Python, Go, etc.).
  • Strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts.
  • Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers who are not direct reports.
  • Strong analytical and programming skills (Python, Go, Java etc.).
  • Deep understanding around best practices for modern cloud security.
  • Proven experience building observability for security concerns, such as privilege escalations and bot detection.

Requirements

  • Location: Hybrid capacity from Holmdel, New Jersey or Fort Lauderdale, Florida, or remote candidates located in other U.S. states for the right individual.
  • In-person interview or face-to-face meeting required for fully remote roles prior to the first day of employment.

Benefits

  • Competitive compensation.
  • Comprehensive health benefits.
  • Generous PTO.
  • 401(k) matching.
  • Paid parental leave for full-time employees.
  • Hybrid work schedules.
  • Career development support.
  • Wellness programs.
  • Opportunities to give back through CR Cares™, our community engagement initiative.

Job Requirements

  • Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
  • Solid experience with Monitoring/APM/Observability tools (Splunk, New Relic etc.).
  • Experience implementing observability plans around logs, metrics, and traces.
  • Experience in an agile development team developing software.
  • Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code (Terraform, CloudFormation).
  • Extensive experience with Docker, Kubernetes, Helm, CI/CD and config management tools like Ansible, Chef.
  • Strong experience with containerization technology and/or Kubernetes.
  • Experience with Release automation, system administration, configuration management.
  • Experience with programming languages (Java, Python, Go, etc.).
  • Strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts.
  • Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers who are not direct reports.
  • Strong analytical and programming skills (Python, Go, Java etc.).
  • Deep understanding around best practices for modern cloud security.
  • Proven experience building observability for security concerns, such as privilege escalations and bot detection.
  • Location: Hybrid capacity from Holmdel, New Jersey or Fort Lauderdale, Florida, or remote candidates located in other U.S. states for the right individual.
  • In-person interview or face-to-face meeting required for fully remote roles prior to the first day of employment.

Benefits

  • Competitive compensation.
  • Comprehensive health benefits.
  • Generous PTO.
  • 401(k) matching.
  • Paid parental leave for full-time employees.
  • Hybrid work schedules.
  • Career development support.
  • Wellness programs.
  • Opportunities to give back through CR Cares™, our community engagement initiative.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Lead

Resolve Tech Solutions

ERP/SAP Modernization | Managed Cloud Delivery Services | Advanced Tech - AI / ML | Cyber Security | Digital Signature

DevOps Engineer2 days ago
Full TimeRemoteTeam 501-1,000H1B Sponsor

Leading design and implementation of scalable cloud infrastructure at RTS

AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaJenkinsKubernetesPrometheusPythonTerraform
Texas

Senior DevOps Engineer

Freestar

Your Programmatic Partner #PublisherFirst

DevOps Engineer2 days ago
Full TimeRemoteTeam 201-500Since 2018H1B No Sponsor

Core member of the Freestar Platform Team ensuring reliable infrastructure.

AWSAzureCloudDockerFirewallsGoogle Cloud PlatformKubernetesPythonTerraform
United States

Staff Site Reliability Engineer

SmarterDx

Improving clinical and financial outcomes with physician-validated AI for documentation and coding.

DevOps Engineer2 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

Staff Site Reliability Engineer leading operational excellence for SmarterDx's production systems

AWSCloudDistributed SystemsKubernetesTerraform
United States
$230K - $250K / year
Full TimeRemoteTeam 201-500

The DevSecOps Engineer will design, implement, and maintain DevSecOps CI/CD pipelines for secure, automated software delivery, integrating automated testing prior to deployment authorization. Responsibilities also include applying DoD STIG requirements, implementing secure coding practices, conducting security scans, and supporting application migration to compliant Cloud Service Providers.

United States
$145K - $150K / year