Netflix

Where you come to do the best work of your life. Follow @WeAreNetflix on Twitter, IG, Facebook, & Youtube for more

Site Reliability Engineer 5, Ads SRE

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 10,001+Since 1997H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

2 days ago

Salary

$388K - $558K / year

5 yrs expEnglishAWSAzureCloudDistributed SystemsGoogle Cloud PlatformJavaKubernetesPythonTerraformGo

Job Description

• Design, implement, and maintain scalable and reliable infrastructure to support Netflix Ads Suite. • Collaborate with engineering and product teams to integrate observability, reliability, and security considerations into the entire software development lifecycle. • Coordinate capacity planning as we scale up Dynamic Ad Insertion for global-scale Netflix Live streaming. • Develop and implement automation tools for monitoring, deployment, and incident response to ensure efficient and reliable operations. • Participate in on-call rotations to ensure the 24/7 health of the Netflix Ad Suite and contribute to incident response, diagnosis, and resolution. • Implement and maintain a robust incident response framework, including blame-aware incident reviews to learn from operational surprises. • Proactively identify sources of instability in distributed systems and analyze how complex systems fail from a reliability and resilience perspective. • Champion and embed a culture of reliability across the Ads organization.

Job Requirements

  • 5+ years of experience as a Site Reliability Engineer (SRE), Production Engineer, or similar role supporting business-critical, high-traffic services.
  • Write code to solve problems. You are proficient in one or more languages like Python, Go, or Java and believe in automating solutions over manual effort.
  • Are fluent in modern cloud infrastructure. You have hands-on experience with cloud providers such as AWS/Azure/GCP, Infrastructure as Code such as Terraform, and container orchestration systems like Kubernetes.
  • Understand large-scale distributed systems, their common failure modes and edge cases.
  • Thrive on collaboration and influence. You have excellent communication skills and a proven ability to build relationships with and educate engineering partners.
  • Are a natural troubleshooter. You can calmly navigate complex production issues, identify root causes, and implement effective, lasting solutions.
  • Possess a growth mindset. You are relentlessly curious, committed to continuous improvement, and passionate about scaling your expertise.

Benefits

  • Health Plans
  • Mental Health support
  • 401(k) Retirement Plan with employer match
  • Stock Option Program
  • Disability Programs
  • Health Savings and Flexible Spending Accounts
  • Family-forming benefits
  • Life and Serious Injury Benefits
  • Paid leave of absence programs
  • Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off
  • Full-time salaried employees are immediately entitled to flexible time off

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Site Reliability Engineer

Akamai

Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away. Join us Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!

DevOps Engineer2 days ago
Full TimeRemoteTeam 5,001-10,000

The Senior Site Reliability Engineer will focus on improving the performance, availability, and scalability of large distributed content delivery systems using Internet technologies. Responsibilities include collaborating on defining SLIs/SLOs, providing technical expertise in design reviews, and developing automation solutions to enhance operational efficiency.

LinuxUNIXPythonbashJavaScriptOracle SQLPrometheusGrafanaDatadogTCPTLSHTTPDNS
United States
$106K - $221K / year

Senior Site Reliability Engineer

Akamai

Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away. Join us Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!

DevOps Engineer2 days ago
Full TimeRemoteTeam 5,001-10,000

Do you enjoy collaborating with teams to solve complex challenges? Do you enjoy solving large scale distributed content delivery challenges? Join our critical Platform and Reliability Engineering Team! The Platform & Reliability Engineering team is responsible for defining, measu...

United States
DevOps Engineer2 days ago
Full TimeRemoteTeam 501-1,000Since 1998

Position Summary:  Telestream is a seeking a DevOps Engineer to ensure seamless collaboration between our Software Development and IT Operations teams.  Your extensive experience and technical expertise in CI/CD pipelines, infrastruct...

CI/CDinfrastructure automationcloud platforms
United States

DevOps Engineer

Bugcrowd

See Security Differently™

DevOps Engineer3 days ago
Full TimeRemoteTeam 201-500Since 2012H1B No Sponsor

We are seeking a DevOps Engineer to support and enhance our cloud infrastructure, CI/CD pipelines, and operational tooling. This role focuses on enabling engineering teams with reliable deployment pipelines and scalable infrastructure for our security platform. Essential Duties a...

AWSDockerKubernetesTerraformBashPythonCI/CDInfrastructure as CodeMicroservicesIncident Response
United States