C the Signs is a cancer prediction system that identifies patients at risk of cancer at the earliest, most curable stage
Senior MLOps Engineer
Location
United States
Posted
11 days ago
Salary
Not specified
Job Description
Position Summary
We’re hiring a Senior MLOps Engineer with deep machine learning engineering experience to build and operate the production platform powering ML/LLM-driven healthcare workflows. You’ll design reliable, secure, and compliant systems for model development, evaluation, deployment, monitoring, and continuous improvement—working closely with ML, data, security, and product teams.
This role is ideal for someone who has shipped ML systems in production and is excited about LLM orchestration, RAG, evaluations, guardrails, and observability in a regulated environment.
Key responsibilities
MLOps & ML Platform
- Design and operate ML platforms that support end-to-end workflows: data ingestion, feature engineering, training, evaluation, deployment, and monitoring.
- Build and maintain CI/CD for ML (testing, packaging, versioning, reproducibility, automated rollbacks, approvals).
- Implement MLOps best practices: model registry, experiment tracking, lineage, governance, and reproducible training environments.
- Develop scalable training infrastructure (distributed training, GPU scheduling, cost controls, auto-scaling).
- Create and maintain feature pipelines / feature stores, ensuring consistency between training and inference (training-serving skew prevention).
- Establish model monitoring and observability: performance, drift, bias/fairness signals (where relevant), latency, throughput, and data quality.
- Build and own end-to-end LLM delivery pipelines: prompt/versioning, retrieval, orchestration, evaluation, deployment, monitoring, and iterative improvement.
- Create robust LLM evaluation harnesses (offline + online): golden datasets, automated regression testing, human-in-the-loop review workflows, and risk scoring.
- Build cost controls: token/cost budgeting, caching strategies, autoscaling, and performance tuning.
Deployment, reliability, and operations
- Productionize ML Models on GCP using containers and orchestration (e.g., GKE, Cloud Run), and build CI/CD for ML/LLM systems with automated tests and safe rollouts.
- Implement observability: tracing, metrics, logs, dashboards, alerting for model/system health (latency, token usage, error rates, retrieval quality, hallucination indicators, drift where relevant).
- Build cost controls: token/cost budgeting, caching strategies, autoscaling, and performance tuning.
Data, governance, and compliance (Healthcare)
- Design systems with security and privacy by default: IAM, least privilege, secrets management, audit logs, encryption, data retention, and PHI/PII handling.
- Implement governance: model/prompt lineage, dataset provenance, evaluation traceability, and approval workflows aligned with healthcare compliance expectations.
Integrate guardrails: content filters, policy checks, prompt injection defenses, structured output validation, and fallback strategies.
Job Requirements
- 6+ years in software/platform engineering, including 4+ years operating ML systems in production (or equivalent depth).
- Strong experience in ML engineering: training pipelines, evaluation, deployment patterns, monitoring, and iteration loops.
- Strong engineering skills in Python, plus production-grade experience building APIs/services.
- Demonstrated hands-on experience with LLM systems in production and ML engineering: training pipelines, evaluation, deployment patterns, monitoring, and iteration loops.
- Strong experience with GCP services and cloud-native patterns.
- Experience with Vertex AI (pipelines, endpoints, feature store, model registry, evaluation) and/or managed vector search on GCP.
- Experience with containerization and orchestration (Docker, Kubernetes/GKE and/or Cloud Run).
Benefits
- Why Join Us?
- Joining C the Signs is not just about building AI; it’s about shaping the future of healthcare. If you are a technical leader with an unshakable belief in the power of AI to save lives and the ability to make it happen at scale, this is your opportunity to create a tangible, global impact.
- Benefits:
- Competitive salary and benefits package.
- Flexible working arrangements (remote or hybrid options available).
- The opportunity to work on life-changing AI technology that directly impacts patient outcomes.
- Join a team that combines cutting-edge innovation with a mission to save lives and improve health equity.
- Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare.
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
This role involves defining and evolving the company's AI and data science technical strategy while architecting and building production AI systems, including LLM applications and RAG pipelines. The engineer will also mentor a small team and establish technical standards across the organization.
The role involves building the Machine Learning and MLOps infrastructure to enable Data Scientists to develop, deploy, and maintain production models using industry best practices. This includes creating automated tools, SDKs, and pipelines ensuring reproducibility, observability, and governance with a self-service focus.
Senior ML Ops Engineer
Sprout SocialSprout Social is a global leader in social media management and analytics software. Sprout’s award-winning platform offers intuitive and comprehensive social media management solutions, including publishing and engagement functionality, customer care, influencer marketing, advocacy, and AI-powered, predictive business intelligence. Founded in 2010 and headquartered in Chicago, Sprout has a hybrid team of 1400 people across the globe with offices in Seattle, Dublin and Poland. Sprout Social is consistently recognized as a best place to work with recent accolades from Fortune, Glassdoor, Built In and more.
The role involves building and maintaining infrastructure using AWS, Terraform, and Kubernetes to support AI/ML, including Generative AI applications at scale. Responsibilities also include managing the end-to-end lifecycle of machine learning models, ensuring observability, and developing tooling to streamline model development and deployment for AI/ML Scientists.
Staff Machine Learning Engineer: Personalization
PrizePicksPrizePicks is the fastest-growing sports company in North America according to the 2023 Inc. 5000 rankings, two years running, and the largest independent skill-based fantasy sports operator in the country.
Lead the design and development of real-time machine learning personalization architecture, focusing on user experience and dynamic content discovery.