The Leaflet

An independent platform for cutting-edge, progressive, legal, and political opinion.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

Florida

Posted

28 days ago

Salary

Not specified

Bachelor Degree5 yrs expEnglishAnsibleAWSAzureCloudGoogle Cloud PlatformGrafanaJavaKubernetesPrometheusPythonTerraformGo

Job Description

• Ensure the availability, reliability, and performance of a high-traffic Java-based application in a distributed environment. • Troubleshoot and resolve complex issues in production and non-production environments. • Participate in both pre- and post-deployment performance testing and monitoring efforts to improve application performance. • Optimize Java application performance, ensuring efficient resource utilization and scaling. • Deploy and manage the Grafana stack (Grafana, Prometheus, Loki) to provide real-time monitoring, logging, and alerting. • Implement and refine observability strategies to enhance application and infrastructure visibility. • Create and maintain dashboards, alerts, and logs for comprehensive monitoring of system health and performance. • Support the operations team’s incident response efforts, conduct post-mortems, and identify root causes of issues to prevent recurrence. • Document and share lessons learned from incidents, contributing to a culture of continuous improvement. • Work closely with developers, architects, and other engineers to design and implement solutions that improve application reliability. • Collaborate closely with DevOps and NOC teams to support the application platform. • Communicate SRE practices and principles to technical and non-technical stakeholders. • Provide feedback and insights on application performance, potential improvements, and observability metrics.

Job Requirements

  • Degree in computer science or a related field, or equivalent work experience
  • 5+ years in SRE, DevOps, or similar Infrastructure roles
  • Experience managing large-scale, high-availability production systems
  • Track record of incident response and post-mortem processes
  • Experience with capacity planning and performance optimization
  • 3+ years hands-on experience managing production Kubernetes clusters
  • Deep understanding of k8s architecture, networking, storage, and security
  • Experience with cluster scaling (Karpenter), upgrades, and multi-cluster management
  • Proficiency with kubectl, Helm, and Kubernetes operators
  • Container orchestration and troubleshooting expertise
  • Advanced expertise with the Grafana stack for dashboards, alerting, and visualization
  • Hands-on experience with Grafana Alloy for telemetry data collection
  • Proficiency in PromQL
  • Experience with Loki for log aggregation and analysis
  • Experience building comprehensive monitoring and alerting strategies
  • Hands-on experience managing Java-based applications in large-scale, distributed environments, with a focus on JVM tuning and application optimization.
  • Cloud Platform expertise (AWS, GCP, or Azure)
  • Familiarity with infrastructure as code (IAC) tools like Terraform/Terragrunt or Ansible.
  • ArgoCD proficiency for GitOps workflows and continuous deployment
  • Strong scripting abilities in Bash, Python, or Go
  • Experience with CI/CD pipleines and automation tools
  • Configuration Management and deployment automation
  • Strong troubleshooting skills, with a proactive approach to diagnosing and resolving performance bottlenecks.
  • Proven experience managing on-call rotations, incident response, and root cause analysis.
  • Ability to mentor junior team members
  • Strong communication skills (both written and verbal), positive attitude, and ability to receive constructive feedback.

Benefits

  • Competitive pay and benefits
  • Flexible vacation allowance
  • A hybrid / remote working environment
  • Startup culture backed by a secure, global brand

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Site Reliability Engineer

Adobe

Changing the world through digital experiences.

DevOps Engineer29 days ago
Full TimeRemoteTeam 10,001+Since 1982H1B Sponsor

Site Reliability Engineer improving application reliability for Adobe

CloudLinux
California + 3 moreAll locations: California, New York, Utah, Washington
$139K - $257.6K / year

Senior DevOps Engineer

Yassir

Yassir is the leading super App in the Maghreb region set to changing the way daily services are provided. It currently operates in 45 cities across Algeria, Morocco and Tunisia with recent expansions into France, Canada and Sub-Saharan Africa. It is backed (~$200M in funding) by VCs from Silicon Valley, Europe and other parts of the world. We offer on-demand services such as ride-hailing and last-mile delivery. Building on this infrastructure, we are now introducing financial services to help our users pay, save and borrow digitally. Helping usher the continent into a digital economy era. We’re not just about serving people - we’re about creating a marketplace to bring people what they need while infusing social values

DevOps Engineer29 days ago
Full TimeRemoteTeam 1,213

As a Senior DevOps Engineer, you will build and maintain scalable cloud systems, improve CI/CD processes, automate deployments, and support engineering teams.

AWSAzureBashDockerElasticsearchGCPGrafanaKibanaKubernetesLogstashMongoDBMySQLNode.jsPostgreSQLPrometheusPythonRedisTerraformTypeScript
United States + 16 moreAll locations: United States, Canada, India, Egypt, Australia, New Zealand, Kenya, Ireland, Senegal, United Kingdom, United Arab Emirates, Mali, Algeria, Singapore, Nigeria, South Africa, Serbia
Full TimeRemoteTeam 1-10Since 1999H1B No Sponsor

Integration and DevSecOps Engineer supporting cloud-native system integration

AWSAzureCloudDNSKubernetesLinuxOpenShiftPythonTCP/IPTerraform
United States
DevOps Engineer29 days ago
Full TimeRemoteTeam 501-1,000H1B Sponsor

Lead DevOps Engineer managing a team of DevOps Engineers remotely in the United States

AzureCloudSQLTerraform.NET
United States
$125K - $150K / year