Simplesense
We help those who help others.
Staff Cloud Operations Engineer
Location
United States
Posted
40 days ago
Salary
$155K - $180K / year
Bachelor Degree7 yrs expEnglishAnsibleCloudTerraform
Job Description
• End-to-End Operational Ownership: Act as the single technical owner responsible for the operational success of critical cloud systems, defining System-Level Objectives (SLOs) and System-Level Indicators (SLIs). Work with other Staff and Principals Engineers to establish operational Infrastructure as Code (IaC) standards and best practices.
• Cross-Functional Collaboration: Coordinate with development to ensure repeatable and reliable feature deployments into cloud environments using CI/CD pipelines and maintain infrastructure through IaC practices.
• Ambiguity Navigation: Tackle vague and complex operational challenges by defining technical strategies and leading the team toward holistic, sustainable solutions.
• Mentorship and Improvement: Elevate the operational maturity of the team through insightful reviews of operational runbooks, CI/CD pipelines, and automation scripts. Mentor operations engineers on troubleshooting, problem solving, and incident response.
• Operational Execution: Focus on the health of critical systems, conducting root cause analysis (RCA) for major incidents and resolving complex, intermittent issues that span on-prem/cloud boundaries.
• Active Operational Support: Participate in periodic help desk rotations and Tier 3 / Tier 4 on-call support, troubleshooting and resolving issues, fixing bugs and implementing solutions to enhance system reliability and performance.
• CI/CD Development: Build automated delivery pipelines and develop internal self-service tools to enhance operational efficiency.
• Stakeholder Collaboration: Work with product and development teams to define operational requirements and communicate system trade-offs effectively.
• Demonstrated experience providing technical leadership, mentorship, and guidance to engineers, with the ability to influence team direction, operational practices, and outcomes. Prior or potential experience supporting people leadership responsibilities (such as onboarding, coaching, or performance feedback) is a plus.
Job Requirements
- Experience: 7+ years in managing cloud environments, systems administration, or related fields, with a focus on cloud-native applications and services.
- Technical Expertise:
- Proficient in Infrastructure as Code (IaC) tools such as Cloudformation or Terraform.
- Proficient in configuration tooling such as Ansible.
- Strong understanding of CI/CD pipelines and tooling development.
- Experience with implementing and tuning observability stacks, including monitoring, logging, and tracing systems.
- Experience in IP networking fundamentals.
- Familiarity using Git command line and other IDE tooling.
- Proven track record in leading complex troubleshooting efforts and root cause analyses related to critical incidents.
- Experience in mentoring junior engineers and enhancing the team's operational readiness.
- Excellent interpersonal and communication skills for cross-team collaboration.
- Must be able to obtain DoD 8570/8140 IAT Level II certification (e.g., CompTIA Security+ CE) within 6 months of hire.
- Travel requirements: 10% travel for quarterly team planning.
- Must be a U.S. Citizen and able to obtain a DoD NIPR network account and Common Access Card (CAC).
- Must have, or be able to obtain, a Secret Clearance.
Benefits
- Equity
- Medical, Life, Short-Term Disability, and AD&D insurance
- Medical travel coverage
- Dental coverage
- Vision coverage
- 401k matching
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer41 days ago
Full TimeRemoteTeam 51-200Since 2016H1B Sponsor
Director of Security & Reliability Engineering at an AI-powered penetration test tool company
AWSAzureCloudGoogle Cloud PlatformKubernetesTerraform
Senior Site Reliability Engineer
Wikimedia FoundationImagine a world in which every single human being can freely share in the sum of all knowledge.
DevOps Engineer41 days ago
Full TimeRemoteTeam 501-1,000Since 2003H1B Sponsor
Senior Site Reliability Engineer operating Wikimedia's data systems
AnsibleDistributed SystemsOpen SourcePuppetPythonRubyTerraformGo
DevOps Engineer41 days ago
Full TimeRemoteTeam 11-50Since 2019H1B No Sponsor
Director of Backend & DevOps Engineering at Lucra managing backend delivery and execution quality
AWSGraphQLJavaScriptNode.jsPostgresTypeScript
United States
Site Reliability Engineer
ICFWe are not a typical consulting firm and our people are not typical consultants.
DevOps Engineer41 days ago
Full TimeRemoteTeam 5,001-10,000Since 1969H1B Sponsor
Site Reliability Engineer improving observability and reliability at ICF.
AirflowCloudGroovyJenkins