AgilityFeat

Nearshore Staff Augmentation & Software Development

Senior SRE DevOps Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 11-50Since 2010H1B No SponsorCompany SiteLinkedIn

Location

Virginia

Posted

11 days ago

Salary

$5K - $7K / month

7 yrs expEnglishAndroidAWSCloudDistributed SystemsDockerI OSIo TJava ScriptKubernetesMicroservicesNode.jsPostgre SQLPythonRedisTerraformType ScriptGo

Job Description

• Implement SLI/SLO frameworks with error budgets, driving data-informed reliability decisions across the platform • Design release strategies including blue/green deployments, canary releases, automatic rollback, and version tracking • Lead incident response, author post-mortems, and build automated runbooks that reduce MTTR • Develop internal tooling, automation frameworks, and self-service platforms in TypeScript/Python to improve developer productivity and operational efficiency • Write reliability-focused services: health checkers, auto-remediation controllers, capacity managers, deployment orchestrators, and chaos testing frameworks • Build and maintain production AWS infrastructure using IaC (Terraform/CloudFormation), with focus on ECS, EKS/Kubernetes, and microservices orchestration • Build and maintain end-to-end CI/CD pipelines for backend services, mobile apps (iOS/Android), and IoT firmware across on-prem and AWS cloud environments • Define and enforce security policies: network segmentation, IAM, secrets management, encryption, compliance auditing, vulnerability management, and incident response • Build comprehensive observability with OpenTelemetry, distributed tracing, custom metrics exporters, and alerting across WebSocket connections, message delivery pipelines, and real-time communication services • Manage PostgreSQL (RDS), Redis/ElastiCache, SQS, S3, and NLB/ALB configurations including Elastic IPs for SIP/RTP traffic

Job Requirements

  • 7+ years in SRE/DevOps/Platform Engineering with a strong software development background
  • Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for building internal tools, CLIs, operators, and automation services
  • Deep AWS expertise: ECS, EKS, RDS, ElastiCache, SQS, VPC networking, IAM, CloudWatch
  • Strong IaC proficiency (Terraform, CloudFormation, or Pulumi) including module design, state management, and drift detection
  • Proven CI/CD pipeline design on both on-prem and cloud (GitHub Actions, CodeBuild/CodePipeline, self-hosted runners)
  • Container orchestration at scale: Docker, ECS task definitions, Kubernetes, Helm, with experience writing custom controllers or operators
  • Solid security background: network security, secrets management, compliance, incident response
  • Experience implementing SLI/SLO frameworks, error budgets, and toil reduction strategies
  • Production PostgreSQL, Redis, and message queue operations (SQS, Redis Streams)
  • Strong understanding of distributed systems patterns: circuit breakers, retries, backpressure, graceful degradation.

Benefits

  • A role where engineering and operations merge, you'll ship code that keeps the platform running
  • Technically challenging environment spanning cloud, IoT, telecom, and satellite systems
  • Full ownership of the infrastructure stack with direct impact on reliability and scale
  • Competitive compensation, flexible remote work and a great work environment

Related Categories

Related Job Pages