Shuru

Give wings to your ideas!

Senior DevOps Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 51-200Since 2021H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

40 days ago

Salary

Not specified

7 yrs expEnglishAWSCloudDistributed SystemsEC2GrafanaJenkinsKubernetesPrometheusPythonSpinnakerTerraformGo

Job Description

• Kubernetes platform engineering (EKS-first) ● Design, build, and operate production-grade Kubernetes clusters (multi-nodegroup, autoscaling, upgrades, cluster add-ons). • Implement intelligent autoscaling using real metrics (queue depth, consumer lag, service latency) via tools like KEDA/Karpenter. • Own AWS environments end-to-end (VPC, IAM, EKS/ECS/EC2, ALB/ELB, S3, Route53, CloudWatch, RDS, SQS, Lambda). • Build reproducible infrastructure using Terraform, with strong review + change management practices. • Implement backup/DR patterns (e.g., snapshots, retention, automation) and safe rollouts. • Design infrastructure for data-intensive workloads: high-throughput ingestion, batch processing, and real-time streaming. • Understand and operate distributed systems at scale — consensus, partitioning, replication, and failure modes. • Build and maintain infrastructure for data pipelines, vector databases. • Design for horizontal scalability, ensuring systems handle growing data volumes and user traffic gracefully. • Build/own monitoring + logging from scratch and make it actionable (Prometheus/Grafana, ELK/EFK, alerting). • Define/partner on SLI/SLOs and incident response practices; improve reliability with data-driven changes. • Establish performance testing and production-like load testing environments. • Continuously reduce AWS spend via right-sizing, Spot strategies, reserved capacity planning, and architecture improvements. • Partner with engineering teams to diagnose bottlenecks (db queries, caching, queueing) and propose scalable solutions. • Optimize infrastructure costs for data-heavy workloads (storage tiering, compute scheduling, GPU utilization). • Improve cloud and cluster security posture (IAM, network policies, secrets management, least privilege). • Support SOC2 readiness/execution (controls, evidence automation, operational hardening). • Implement access management patterns.

Job Requirements

  • 7+ years in DevOps / SRE / Cloud Infra roles operating production systems.
  • Deep hands-on experience with Kubernetes in production.
  • Strong AWS fundamentals across compute/networking/storage/identity, including VPC, IAM, EC2/EKS, ALB, S3, Route53, CloudWatch, RDS, SQS.
  • Proven ability to build infra using Terraform (and strong IaC practices).
  • Production-grade observability experience: Prometheus + Grafana, and centralized logging (ELK/EFK or similar).
  • Experience scaling product infrastructure — you've grown systems from thousands to millions of requests, and understand capacity planning, bottleneck identification, and scaling patterns.
  • Solid understanding of distributed systems concepts: CAP theorem, consistency models, partitioning strategies, distributed consensus, and failure handling.
  • Strong understanding of databases and performance fundamentals.
  • CI/CD experience building reliable pipelines (Jenkins/Spinnaker/GitHub Actions equivalents), with safe deployment strategies.
  • Scripting/automation ability in Python and/or Bash (Go is a plus).

Benefits

  • Competitive salary and benefits package.
  • Opportunity to work with a team of experienced product and tech leaders.
  • A flexible work environment with remote working options.
  • Continuous learning and development opportunities.
  • Chance to make a significant impact on diverse and innovative projects.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Devops Engineer

ghSMART

We help CEOs, boards and investors develop winning executive teams and make high-stakes leadership decisions.

DevOps Engineer41 days ago
Full TimeRemoteTeam 51-200Since 1995H1B No Sponsor

The Senior DevOps Engineer ensures uptime, availability, and performance of the BrightEdge S3 platform, maintaining cloud infrastructure and automating operations.

AnsibleAWSGCPMySQLNoSQLPythonShellTerraform
United States

Lead Site Reliability Engineer

Centene Corporation

Transforming the health of the communities we serve, one person at a time.

DevOps Engineer42 days ago
Full TimeRemoteTeam 10,001+Since 1984H1B No Sponsor

Lead Site Reliability Engineer ensuring optimum platform performance at Centene

AnsibleAWSAzureCloudGrafanaITSMJenkinsLinuxMongoDBMySQLOpenShiftOraclePrometheusSplunkSQLSubversionUnix
California + 2 moreAll locations: California, Florida, Missouri
$102.9K - $190.5K / year

Deployment Engineer

Prelude

Know with certainty that your defenses will protect you against the latest threats.

DevOps Engineer42 days ago
Full TimeRemoteTeam 11-50H1B Sponsor

Deployment Engineer ensuring operational excellence of onboarding and deployment at Origin

AnsibleChefDNSFirewallsJamfMacOSPuppetPythonTCP/IP
United States
$170K - $270K / year

Senior DevOps Engineer

FICO

FICO is an analytics company helping businesses make better decisions that drive higher levels of growth and success.

DevOps Engineer42 days ago
Full TimeRemoteTeam 1,001-5,000Since 1956H1B No Sponsor

DevOps Engineer managing infrastructure for FICO’s Scores Business Unit

AnsibleAWSCloudCyber SecurityDNSDockerEC2JavaKubernetesLinuxTCP/IP
United States
$116K - $182K / year