Aspire Software

We never stop building. A vertical acquisition software company that owns, operates and manages a diverse portfolio.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 1,001-5,000H1B No SponsorCompany SiteLinkedIn

Location

Maryland

Posted

33 days ago

Salary

Not specified

5 yrs expEnglishAzureCloudDockerKubernetesLinuxTerraformVault

Job Description

• Own and operate a production cloud platform running on Microsoft Azure and Cloud Foundry (or comparable platforms) • Ensure availability, performance, and reliability across infrastructure and platform components • Serve as the primary escalation point for platform-level incidents • Lead incident response, root cause analysis, and post-incident remediation • Use modern monitoring, alerting, and AI-assisted observability tools to improve detection, diagnosis, and resolution of incidents • Drive continuous improvements to reduce operational risk, after-hours incidents, and manual intervention • Own certificate and secrets lifecycle management, including TLS automation and secure secrets handling (e.g., CredHub, Vault) • Ensure secure and compliant practices around identity, access, and credential management • Partner with engineering teams to embed security and reliability best practices into platform workflows • Automate common operational tasks using Bash and/or PowerShell • Support and extend infrastructure-as-code using Terraform and/or Bicep • Improve platform consistency and repeatability through Git-driven, automation-first workflows • Leverage AI-assisted tooling to support scripting, troubleshooting, and operational documentation • Support PCI and other compliance activities, including technical control implementation, audit support, and remediation tracking • Maintain clear runbooks, diagrams, and documentation to enable repeatable operations and knowledge transfer • Partner with internal teams and external auditors to support compliance requirements • Work closely with application engineers, junior SRE/support staff, and vendor partners • Provide technical guidance and mentorship to junior teammates • Act as a trusted partner to engineering teams on reliability, performance, and operational readiness

Job Requirements

  • 5+ years of experience in SRE, DevOps, or infrastructure engineering roles supporting production environments
  • Hands-on experience with Cloud Foundry, Kubernetes, or Docker in production (Cloud Foundry preferred)
  • Strong experience with Microsoft Azure, including networking, compute, IAM, and monitoring
  • Strong Linux systems administration experience (RHEL preferred); comfort with Windows Server environments
  • Proficiency in PowerShell and/or Bash scripting
  • Solid understanding of TLS/PKI workflows, including certificate management and rotation
  • Proven experience managing incidents end-to-end and performing root cause analysis
  • Strong written communication skills and a disciplined approach to documentation
  • Experience using modern automation, observability, or AI-enabled operational tools to improve reliability and efficiency

Related Categories

Related Job Pages