Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can’t be done by solving the most daunting challenges facing our customers.
Senior AWS Cloud Site Reliability Engineer
Location
United States
Posted
2 days ago
Salary
$104K - $166K / year
No structured requirement data.
Job Description
Role Description
We are seeking an experienced and motivated Senior AWS Cloud Site Reliability Engineer (SRE) to join our dynamic team. As an AWS Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure on Amazon Web Services (AWS). The ideal candidate will have a strong background in AWS services, a deep understanding of infrastructure as code, and deep expertise with relational databases. The AWS Site Reliability Engineer (SRE) will collaborate closely with cross-functional teams, including development, quality assurance, and operations, to ensure seamless software releases and continuous improvement of our release processes.
-
Infrastructure Automation:
- Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate continuous database deployment and scaling processes.
- Collaborate with development teams to integrate continuous deployment practices and ensure the reliability of applications and databases.
-
Monitoring and Alerting:
- Implement robust monitoring and alerting systems to proactively identify and address potential issues before they impact system performance.
- Analyze system metrics, logs, and alerts to troubleshoot and resolve issues promptly.
-
Performance Optimization:
- Conduct performance analysis and optimization of AWS infrastructure components to enhance system efficiency and reduce latency.
- Identify and implement improvements to enhance system reliability and resilience.
-
Incident Response:
- Participate in on-call rotations to respond to and resolve incidents promptly.
- Conduct post-incident reviews to identify root causes and implement preventive measures.
-
Security and Compliance:
- Work closely with security teams to implement and enforce best practices for securing AWS environments.
- Ensure compliance with industry standards and regulations related to cloud infrastructure.
-
Communication:
- Facilitate clear communication across teams, providing updates on release status, known issues, and any potential impact on stakeholders.
- Coordinate communication of release schedules and changes to all relevant parties.
-
Release Planning and Coordination:
- Collaborate with development, QA, and operations teams to plan and coordinate database schema releases.
- Define release scope, schedule, and dependencies to ensure timely and smooth deployments.
- Create and submit change records as required for process and audit compliance.
- Participation in Technical Change Advisory and Review boards as required.
-
Release Automation:
- Develop and maintain automated deployment pipelines using industry-standard tools such as GitLab CI/CD, Liquibase, or similar.
- Automate and streamline release processes to improve efficiency and reduce manual errors.
-
Continuous Improvement:
- Proactively identify areas for process improvement within the release management lifecycle.
- Implement feedback loops to capture lessons learned from each release and apply improvements iteratively.
- Stay up to date with industry best practices, emerging technologies, and trends related to database automation.
-
Quality Assurance:
- Collaborate with QA teams to establish and execute release validation procedures.
- Ensure releases are thoroughly tested and meet quality standards before deployment.
- Drive continuous improvement by analyzing release management trends, identifying recurring issues, and working with teams to implement solutions.
Qualifications
- Bachelor's Degree and 8 years of experience or 12 years of experience and a HS Degree/Diploma.
- Proven experience as a Site Reliability Engineer or similar role with a strong emphasis on relational databases.
- In-depth knowledge of AWS services like RDS and DynamoDB and expertise in managing cloud infrastructure.
- Advanced level programming and/or scripting in 3 or more of the following languages: Python, Java, Chef, Helm, Playwright, Bash, JavaScript, Terraform.
- Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines.
- Proficiency in CI/CD tools such as GitLab CI/CD, Liquibase, or others.
- Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, or similar technologies.
- Hands-on experience with version control systems (GitLab, GitHub, AWS CodeCommit) and branching strategies.
- Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes).
- Familiarity with monitoring tools (e.g., CloudWatch, Prometheus, Grafana, Datadog) and log analysis.
- Attention to detail, with a focus on maintaining high-quality software releases.
- Solid understanding of Agile methodologies and their application in release management.
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration skills.
- Must be a US Citizen.
- Must be able to obtain and maintain the required agency clearance (6C Public Trust).
Requirements
- Relevant certifications in DevOps or related fields are a plus.
- High Risk Public Trust or Secret Clearance preferred.
- 3 or more years in SRE or Platform Engineering group for high availability/critical platforms/applications.
- 2 or more years managing relational databases.
Company Description
Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world’s leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can’t be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we’re keeping people around the world safe and secure.
Target Salary Range
$104,000 - $166,000. This represents the typical salary range for this position. Salary is determined by various factors, including but not limited to, the scope and responsibilities of the position, the individual’s experience, education, knowledge, skills, and competencies, as well as geographic location and business and contract considerations. Depending on the position, employees may be eligible for overtime, shift differential, and a discretionary bonus in addition to base pay.
EEO
EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law.
Job Requirements
- Bachelor's Degree and 8 years of experience or 12 years of experience and a HS Degree/Diploma.
- Proven experience as a Site Reliability Engineer or similar role with a strong emphasis on relational databases.
- In-depth knowledge of AWS services like RDS and DynamoDB and expertise in managing cloud infrastructure.
- Advanced level programming and/or scripting in 3 or more of the following languages: Python, Java, Chef, Helm, Playwright, Bash, JavaScript, Terraform.
- Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines.
- Proficiency in CI/CD tools such as GitLab CI/CD, Liquibase, or others.
- Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, or similar technologies.
- Hands-on experience with version control systems (GitLab, GitHub, AWS CodeCommit) and branching strategies.
- Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes).
- Familiarity with monitoring tools (e.g., CloudWatch, Prometheus, Grafana, Datadog) and log analysis.
- Attention to detail, with a focus on maintaining high-quality software releases.
- Solid understanding of Agile methodologies and their application in release management.
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration skills.
- Must be a US Citizen.
- Must be able to obtain and maintain the required agency clearance (6C Public Trust).
- Relevant certifications in DevOps or related fields are a plus.
- High Risk Public Trust or Secret Clearance preferred.
- 3 or more years in SRE or Platform Engineering group for high availability/critical platforms/applications.
- 2 or more years managing relational databases.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
The Site Reliability Engineer will ensure the reliability, scalability, security, and performance of production systems by implementing automation, monitoring, and best practices to enable rapid, reliable application delivery. Key duties include managing cloud infrastructure, building CI/CD pipelines, participating in on-call rotations, and collaborating with development teams on system design.
The Site Reliability Engineer builds and operates the paved roads that service teams use every day. You take shared infrastructure from idea to module to production, then you keep it boring. This is not a research role and not a hero role. It is delivery with discipline. You buil...
Staff Site Reliability Engineer
JobgetherWe use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1 We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
This is a senior, hands-on role within a small, high-leverage SRE team, responsible for ensuring the reliability, scalability, and security of a high-growth digital financial platform. The Staff SRE will architect, automate, and optimize cloud infrastructure, focusing on operatio...
Associate Reliability Engineer
ChompsProtein-packed meat snacks that deliver on taste, simple ingredients and powerful nutrition!
Reliability Engineer focused on asset maintenance for packaging equipment at Chomps