GitLab

Build software faster. The One DevOps Platform enables your entire org to collaborate around your code. We're hiring.

Senior Site Reliability Engineer, Database Excellence

Full TimeRemoteTeam 1,001-5,000Since 2014H1B No SponsorCompany SiteLinkedIn

Location

California + 1 moreAll locations: California, New York

Posted

1 day ago

Salary

$124.3K - $266.4K / year

Bachelor DegreeEnglishAnsibleChefKubernetesPostgre SQLPuppetRubySQLTerraformGo

Job Description

• Automate operational tasks across all environments, from package updates and configuration changes to provisioning of user-facing services, so manual effort becomes the exception, not the rule. • Design and maintain PostgreSQL database infrastructure components that allow GitLab.com to scale reliably while supporting hundreds of thousands of concurrent users. • Respond to production incidents and platform emergencies, working with peer SREs to diagnose and resolve database-related issues quickly and thoroughly. • Build observability systems that monitor database health, predict capacity needs based on usage patterns, and alert on symptoms rather than outages. • Develop and ship database performance solutions in collaboration with product and engineering teams, including query optimization, migration reviews, and infrastructure recommendations. • Create self-service tools and automation, using Terraform, Ansible, Chef, and GitLab ChatOps, that empower engineering teams to manage their own database interactions safely. • Document decisions, learnings, and operational procedures so that knowledge becomes repeatable actions and eventually becomes automation. • Participate in regularly scheduled on-call rotations to ensure GitLab.com remains operational during off-hours and weekends when necessary.

Job Requirements

  • Hands-on experience running PostgreSQL in high-growth, large production environments, including both self-managed infrastructure and database-as-a-service platforms.
  • Expertise with infrastructure automation and configuration management tools such as Ansible, Terraform, Chef, or Puppet to automate operational tasks and drive system reliability.
  • Solid understanding of SQL, PL/pgSQL, data modeling, and data structure design; ability to analyze PostgreSQL internals to troubleshoot and optimize systems.
  • Experience working in large-scale, distributed SaaS production environments where you've managed reliability, performance, and scalability challenges at significant scale.
  • Strong written communication skills and commitment to documentation; you thrive in remote, asynchronous environments and share knowledge effectively across your team.
  • Proactive, hands-on approach where you identify issues, take ownership of solutions, and contribute improvements to infrastructure and code.
  • Capability to mentor junior team members and develop deep expertise in your domain areas, then share that knowledge to help others grow.
  • Backend engineering experience with languages such as Ruby or Go, and/or familiarity with OLAP databases like Clickhouse.
  • Familiarity with Kubernetes and operators for managing database infrastructure and stateful services in containerized environments.

Benefits

  • Benefits to support your health, finances, and well-being
  • Flexible Paid Time Off
  • Team Member Resource Groups
  • Equity Compensation & Employee Stock Purchase Plan
  • Growth and Development Fund
  • Parental leave
  • Home office support

Related Categories

Related Job Pages