Site Reliability Engineer
Location
United States
Posted
8 days ago
Salary
Not specified
No structured requirement data.
Job Description
Role Description
You will collaborate with management to advance our data and analytics transformation, enhance productivity, and enable agile, data-driven decisions. By leveraging relationships with top tech startups and universities, you will help create competitive advantages and drive enterprise innovation.
We are seeking a Site Reliability Engineer (SRE) to ensure the scalability, stability, and performance of our data platforms and ML infrastructure. You’ll work closely with data scientists, ML engineers, and platform vendors to deploy and monitor production systems, automate workflows, and reduce operational overhead.
- Build and maintain infrastructure to support real-time and batch ML workloads
- Implement observability tools (logging, monitoring, alerting) for model performance and system uptime
- Design and manage CI/CD pipelines for ML and data applications
- Ensure high availability, disaster recovery, and rollback capabilities for production environments
- Manage access controls, secrets, and security policies in collaboration with compliance and IT
- Troubleshoot incidents, lead postmortems, and drive root-cause resolution
- Work with U.S. and international teams to provide 24/7 coverage across time zones
Qualifications
- 3–6 years of experience in DevOps, SRE, or backend engineering roles
- Proficient with tools like Docker, Kubernetes, Terraform, GitLab/GitHub Actions, Airflow
- Strong scripting in Python or Bash and familiarity with Linux environments
- Experience deploying and monitoring ML models or data pipelines in production
- Knowledge of observability stacks (e.g., Prometheus, Grafana, ELK, Datadog)
- Familiarity with cloud platforms (e.g., AWS, GCP, or Azure)
- Strong documentation, problem-solving, and incident response skills
Requirements
- Experience supporting ML/AI workflows using Palantir Foundry
- Exposure to compliance frameworks like SOC 2, ISO 27001, or financial regulations
- Knowledge of MLOps frameworks (e.g., MLflow, Kubeflow, SageMaker Pipelines)
- Ability to automate deployments, testing, and monitoring at scale
- Work on real-world AI applications with high-impact clients
- Collaborate with world-class data scientists, engineers, and product leaders
- Flat org structure, high trust, high autonomy
Benefits
- Competitive salary + performance-based incentives
Position Location
This position is planned to be based in Jacksonville, FL. Remote candidates will be considered on a case-by-case basis.
Compensation
The base pay for this position is $120,000-190,000. A bonus will be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits.
Job Requirements
- 3–6 years of experience in DevOps, SRE, or backend engineering roles
- Proficient with tools like Docker, Kubernetes, Terraform, GitLab/GitHub Actions, Airflow
- Strong scripting in Python or Bash and familiarity with Linux environments
- Experience deploying and monitoring ML models or data pipelines in production
- Knowledge of observability stacks (e.g., Prometheus, Grafana, ELK, Datadog)
- Familiarity with cloud platforms (e.g., AWS, GCP, or Azure)
- Strong documentation, problem-solving, and incident response skills
- Experience supporting ML/AI workflows using Palantir Foundry
- Exposure to compliance frameworks like SOC 2, ISO 27001, or financial regulations
- Knowledge of MLOps frameworks (e.g., MLflow, Kubeflow, SageMaker Pipelines)
- Ability to automate deployments, testing, and monitoring at scale
- Work on real-world AI applications with high-impact clients
- Collaborate with world-class data scientists, engineers, and product leaders
- Flat org structure, high trust, high autonomy
Benefits
- Competitive salary + performance-based incentives
- Position Location
- This position is planned to be based in Jacksonville, FL. Remote candidates will be considered on a case-by-case basis.
- Compensation
- The base pay for this position is $120,000-190,000. A bonus will be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Mainframe DevOps Engineer
CapgeminiCapgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. Responsible and diverse group of 340,000 team members in more than 50 countries Strong over 55-year heritage Trusted by clients to unlock the value of technology to address the entire breadth of their business needs Delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering Market leading capabilities in AI, generative AI, cloud and data Deep industry expertise and partner ecosystem
The role involves leading and executing SCM migration projects from legacy systems like Endevor to DBB/Git/IDD, which includes installing, configuring, and optimizing the new environments on z/OS. Additionally, the engineer will be responsible for building and debugging mainframe CI/CD pipelines and providing training and technical documentation to clients.
The Service Desk Technician provides technical support to corporate clients via phone, email, and chat, assisting users with troubleshooting issues related to Outlook, Microsoft Operating Systems, Office Products, and mobile device synchronization. This role involves efficiently handling 50-60 issues daily, including managing password resets, printer configurations, and troubleshooting remote access connectivity problems.
Senior Database Site Reliability Engineer
TherapyNotes, LLCTherapyNotes™ is the industry-preferred online EHR for behavioral health. Try one month free!
Seeking a Database Site Reliability Engineer managing PostgreSQL for a 24x7 SaaS platform
The Azure DevOps Engineer will be responsible for designing, implementing, and maintaining robust CI/CD pipelines using Azure DevOps for various application types. This role also involves automating deployment, testing, and monitoring processes while enforcing security best practices within Azure cloud environments.