AgilityFeat
Nearshore Staff Augmentation & Software Development
Senior SRE DevOps Engineer
Location
Virginia
Posted
11 days ago
Salary
$5K - $7K / month
7 yrs expEnglishAndroidAWSCloudDistributed SystemsDockerI OSIo TJava ScriptKubernetesMicroservicesNode.jsPostgre SQLPythonRedisTerraformType ScriptGo
Job Description
• Implement SLI/SLO frameworks with error budgets, driving data-informed reliability decisions across the platform
• Design release strategies including blue/green deployments, canary releases, automatic rollback, and version tracking
• Lead incident response, author post-mortems, and build automated runbooks that reduce MTTR
• Develop internal tooling, automation frameworks, and self-service platforms in TypeScript/Python to improve developer productivity and operational efficiency
• Write reliability-focused services: health checkers, auto-remediation controllers, capacity managers, deployment orchestrators, and chaos testing frameworks
• Build and maintain production AWS infrastructure using IaC (Terraform/CloudFormation), with focus on ECS, EKS/Kubernetes, and microservices orchestration
• Build and maintain end-to-end CI/CD pipelines for backend services, mobile apps (iOS/Android), and IoT firmware across on-prem and AWS cloud environments
• Define and enforce security policies: network segmentation, IAM, secrets management, encryption, compliance auditing, vulnerability management, and incident response
• Build comprehensive observability with OpenTelemetry, distributed tracing, custom metrics exporters, and alerting across WebSocket connections, message delivery pipelines, and real-time communication services
• Manage PostgreSQL (RDS), Redis/ElastiCache, SQS, S3, and NLB/ALB configurations including Elastic IPs for SIP/RTP traffic
Job Requirements
- 7+ years in SRE/DevOps/Platform Engineering with a strong software development background
- Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for building internal tools, CLIs, operators, and automation services
- Deep AWS expertise: ECS, EKS, RDS, ElastiCache, SQS, VPC networking, IAM, CloudWatch
- Strong IaC proficiency (Terraform, CloudFormation, or Pulumi) including module design, state management, and drift detection
- Proven CI/CD pipeline design on both on-prem and cloud (GitHub Actions, CodeBuild/CodePipeline, self-hosted runners)
- Container orchestration at scale: Docker, ECS task definitions, Kubernetes, Helm, with experience writing custom controllers or operators
- Solid security background: network security, secrets management, compliance, incident response
- Experience implementing SLI/SLO frameworks, error budgets, and toil reduction strategies
- Production PostgreSQL, Redis, and message queue operations (SQS, Redis Streams)
- Strong understanding of distributed systems patterns: circuit breakers, retries, backpressure, graceful degradation.
Benefits
- A role where engineering and operations merge, you'll ship code that keeps the platform running
- Technically challenging environment spanning cloud, IoT, telecom, and satellite systems
- Full ownership of the infrastructure stack with direct impact on reliability and scale
- Competitive compensation, flexible remote work and a great work environment