Twilio

Build the future of communications.

Software Architect, Reliability Engineering

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 5,001-10,000H1B SponsorCompany SiteLinkedIn

Location

California + 9 moreAll locations: California, Colorado, Illinois, New Jersey, New York, Maryland, Massachusetts, Minnesota, Vermont, Washington

Posted

11 days ago

Salary

$227.8K - $335K / year

Bachelor Degree15 yrs expEnglishAWSCloudDistributed SystemsGrafanaJavaKubernetesMicroservicesPrometheusPythonTerraformGo

Job Description

• Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes. • Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs. • Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services; • Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability. • Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management. • Collaborate with product and cross-functional teams to identify reliability risks and convert them into actionable designs, programs, and tooling. • Establish and champion reliability practices and drive systemic improvements. • Mentor and grow engineers and technical leaders • Track and apply emerging SRE, cloud, and large-scale systems best practices; introduce pragmatic innovations that improve reliability at scale.

Job Requirements

  • 15+ years of experience in Reliability Engineering, Software Engineering, DevOps roles with a focus on infrastructure, backend systems, and reliability, including as a principal/architect.
  • Strong experience in driving strategic technical decisions and defining long-term technical vision.
  • In-depth understanding of the role of Reliability Engineering in a large and diverse SaaS organization.
  • Experience driving cross-org technical architecture outcomes.
  • Knowledge of cloud architecture, devops practices, and large-scale systems design with microservices.
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience).
  • Strong production experience, including operational management, scaling, partitioning strategies, and tuning for performance and reliability in high-scale environments.
  • Hands-on experience with Kubernetes (e.g., EKS), deploying and managing stateful services, and cloud services like AWS.
  • Proficiency in infrastructure-as-code tools such as Terraform or CloudFormation for automating infrastructure.
  • Expertise in observability tools (e.g., Prometheus, Grafana, Datadog) for monitoring distributed systems and setting up alerting.
  • Proficient in at least one programming language (e.g., Go, Python, Java) for building automation and tooling.
  • Experience designing incident response processes, SLOs/SLIs, runbooks, and participating in on-call rotations.
  • Experience running cross-functional post-incident reviews and driving improvements.
  • Strong understanding of distributed systems principles, including consensus, durability, throughput, and availability tradeoffs.
  • Proven track record of leading reliability improvements in data-intensive or mission-critical systems and collaborating with engineering teams.
  • Excellent problem-solving, analytical, verbal, and written communication skills, with the ability to work in cross-functional and distributed environments.
  • Demonstrated leadership in mentoring teams, influencing decisions, and balancing long-term objectives with short-term needs.
  • Ability to influence and build effective working relationships with all levels of the organization.

Benefits

  • health care insurance
  • 401(k) retirement account
  • paid sick time
  • paid personal time off
  • paid parental leave

Related Categories

Related Job Pages