Barti

Empowering eye care providers with world-class software.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 11-50Company SiteLinkedIn

Location

United States

Posted

31 days ago

Salary

$150K - $200K / year

Bachelor Degree5 yrs expEnglishCloudDistributed SystemsDockerGoogle Cloud PlatformGrafanaKubernetesLinuxPrometheusPythonTerraformGo

Job Description

• Lead and participate in the design, implementation, and maintenance of highly available and scalable infrastructure. • Monitor system health, performance metrics, and capacity planning to ensure optimal performance. • Establish and track SLIs, SLOs, and error budgets to measure and improve system reliability. • Design and implement Infrastructure as Code (IaC) solutions using tools like Terraform, Pulumi, or CloudFormation. • Build and maintain CI/CD pipelines to enable rapid, safe deployments. • Automate operational tasks and eliminate toil through scripting and tooling. • Lead incident response efforts, including on-call rotation, post-mortem analysis, and remediation. • Debug and resolve complex production issues across the entire stack. • Implement monitoring, alerting, and observability solutions to detect and prevent issues proactively. • Provide technical leadership and mentorship to engineers on reliability and infrastructure best practices. • Collaborate with cross-functional teams, including Engineering and Product to ensure reliable product delivery. • Lead the technical design of infrastructure solutions, ensuring alignment with architectural principles and business goals. • Stay updated with emerging technologies and industry trends in SRE, DevOps, and cloud infrastructure. • Propose and drive the adoption of best practices, tools, and processes to enhance system reliability and developer productivity. • Conduct chaos engineering experiments and disaster recovery drills to validate system resilience. • Implement and maintain security best practices across infrastructure and applications. • Manage secrets, access controls, and security monitoring systems. • Foster a collaborative environment within the engineering team and across departments. • Clearly communicate technical concepts and system health to both technical and non-technical stakeholders. • Work closely with engineering teams to define reliability requirements and ensure operational excellence.

Job Requirements

  • 5+ years (ideally 7+) of relevant work experience in Site Reliability Engineering, DevOps, or Infrastructure roles
  • 1+ years of hands-on experience with either Python, Go, or Bash scripting
  • Experience with cloud platforms (ideally GCP) and container orchestration (Kubernetes, Docker)
  • Proficiency with Infrastructure as Code tools (Terraform, CloudFormation, or similar)
  • Strong understanding of Linux systems, networking, and distributed systems
  • Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar)
  • Excellent problem-solving and communication skills
  • Able to work independently and as part of a team

Benefits

  • Be part of a mission-driven, rapidly scaling company changing the future of eye care
  • Work remotely from anywhere in the U.S.
  • Collaborate with a passionate, fun, and supportive team
  • Competitive salary - $150,000 - $200,000
  • Equity in a fast-growing startup
  • Health, vision, and dental benefits
  • Unlimited PTO
  • Annual professional development stipend
  • A high-impact role with plenty of room for growth, ownership, and creativity

Related Categories

Related Job Pages

More DevOps Engineer Jobs

SRE End User Services Product Owner

Leidos

Leidos is an innovation company rapidly addressing the world’s most vexing challenges in national security and health.

DevOps Engineer31 days ago
Full TimeRemoteTeam 10,001+Since 1969H1B Sponsor

Site Reliability Engineer improving performance and reliability for Navy-Marine Corps Intranet

Python
District of Columbia + 2 moreAll locations: District of Columbia, Hawaii, Virginia
$92.3K - $166.9K / year

Staff Security Engineer

OpenLoop Health

We have a relatively flat organizational structure here at OpenLoop. Everyone is encouraged to bring ideas to the table and make things happen. This fits in well with our core values of Autonomy, Competence and Belonging, as we want everyone to feel empowered and supported to do their best work. Sound like a good fit? We’d love to meet you.

DevOps Engineer32 days ago
Full TimeRemoteTeam 201-500

OpenLoop was co-founded by CEO, Dr. Jon Lensing, and COO, Christian Williams, with the vision to bring care anywhere. Our telehealth support solutions are thoughtfully designed to streamline and simplify go-to-market care delivery for companies offering meaningful virtual support...

United States

Senior Sire Reliability Engineer

CertifID

CertifID is the most secure way to send and receive wiring information.

DevOps Engineer32 days ago
Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor

Senior Site Reliability Engineer driving reliability improvements for production SaaS environment

AWSAzureDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesLinuxPrometheusPythonTerraformGo.NET
Texas

Senior Security Engineer – DevSecOps

PrizePicks

PrizePicks is the fastest-growing sports company in North America according to the 2023 Inc. 5000 rankings, two years running, and the largest independent skill-based fantasy sports operator in the country.

DevOps Engineer32 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor

Senior Security Engineer developing security practices for PrizePicks' infrastructure

AWSAzureCloudGoogle Cloud PlatformKubernetesTerraform
United States
$120K - $170K / year