C the Signs

C the Signs is a cancer prediction system that identifies patients at risk of cancer at the earliest, most curable stage

Senior MLOps Engineer

Machine Learning EngineerMachine Learning EngineerFull TimeRemoteTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

11 days ago

Salary

Not specified

English

Job Description

Position Summary

We’re hiring a Senior MLOps Engineer with deep machine learning engineering experience to build and operate the production platform powering ML/LLM-driven healthcare workflows. You’ll design reliable, secure, and compliant systems for model development, evaluation, deployment, monitoring, and continuous improvement—working closely with ML, data, security, and product teams.

This role is ideal for someone who has shipped ML systems in production and is excited about LLM orchestration, RAG, evaluations, guardrails, and observability in a regulated environment.

Key responsibilities

MLOps & ML Platform

  • Design and operate ML platforms that support end-to-end workflows: data ingestion, feature engineering, training, evaluation, deployment, and monitoring.
  • Build and maintain CI/CD for ML (testing, packaging, versioning, reproducibility, automated rollbacks, approvals).
  • Implement MLOps best practices: model registry, experiment tracking, lineage, governance, and reproducible training environments.
  • Develop scalable training infrastructure (distributed training, GPU scheduling, cost controls, auto-scaling).
  • Create and maintain feature pipelines / feature stores, ensuring consistency between training and inference (training-serving skew prevention).
  • Establish model monitoring and observability: performance, drift, bias/fairness signals (where relevant), latency, throughput, and data quality.
  • Build and own end-to-end LLM delivery pipelines: prompt/versioning, retrieval, orchestration, evaluation, deployment, monitoring, and iterative improvement.
  • Create robust LLM evaluation harnesses (offline + online): golden datasets, automated regression testing, human-in-the-loop review workflows, and risk scoring.
  • Build cost controls: token/cost budgeting, caching strategies, autoscaling, and performance tuning.


Deployment, reliability, and operations

  • Productionize ML Models on GCP using containers and orchestration (e.g., GKE, Cloud Run), and build CI/CD for ML/LLM systems with automated tests and safe rollouts.
  • Implement observability: tracing, metrics, logs, dashboards, alerting for model/system health (latency, token usage, error rates, retrieval quality, hallucination indicators, drift where relevant).
  • Build cost controls: token/cost budgeting, caching strategies, autoscaling, and performance tuning.

Data, governance, and compliance (Healthcare)

  • Design systems with security and privacy by default: IAM, least privilege, secrets management, audit logs, encryption, data retention, and PHI/PII handling.
  • Implement governance: model/prompt lineage, dataset provenance, evaluation traceability, and approval workflows aligned with healthcare compliance expectations.

Integrate guardrails: content filters, policy checks, prompt injection defenses, structured output validation, and fallback strategies.

Job Requirements

  • 6+ years in software/platform engineering, including 4+ years operating ML systems in production (or equivalent depth).
  • Strong experience in ML engineering: training pipelines, evaluation, deployment patterns, monitoring, and iteration loops.
  • Strong engineering skills in Python, plus production-grade experience building APIs/services.
  • Demonstrated hands-on experience with LLM systems in production and ML engineering: training pipelines, evaluation, deployment patterns, monitoring, and iteration loops.
  • Strong experience with GCP services and cloud-native patterns.
  • Experience with Vertex AI (pipelines, endpoints, feature store, model registry, evaluation) and/or managed vector search on GCP.
  • Experience with containerization and orchestration (Docker, Kubernetes/GKE and/or Cloud Run).

Benefits

  • Why Join Us?
  • Joining C the Signs is not just about building AI; it’s about shaping the future of healthcare. If you are a technical leader with an unshakable belief in the power of AI to save lives and the ability to make it happen at scale, this is your opportunity to create a tangible, global impact.
  • Benefits:
  • Competitive salary and benefits package.
  • Flexible working arrangements (remote or hybrid options available).
  • The opportunity to work on life-changing AI technology that directly impacts patient outcomes.
  • Join a team that combines cutting-edge innovation with a mission to save lives and improve health equity.
  • Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare.

Related Job Pages

More Machine Learning Engineer Jobs

Machine Learning Engineer11 days ago
Full TimeRemoteTeam 501-1,000

This role involves defining and evolving the company's AI and data science technical strategy while architecting and building production AI systems, including LLM applications and RAG pipelines. The engineer will also mentor a small team and establish technical standards across the organization.

PythonSQLAWSNLPLLMRAGMLOpsStatistical ModelingExperiment DesignFeature EngineeringData EngineeringModel DeploymentMonitoringVector DatabasesLangChainLlamaIndex
United States
$155K - $170K / year
Machine Learning Engineer11 days ago
Full TimeRemoteTeam 501-1,000

The role involves building the Machine Learning and MLOps infrastructure to enable Data Scientists to develop, deploy, and maintain production models using industry best practices. This includes creating automated tools, SDKs, and pipelines ensuring reproducibility, observability, and governance with a self-service focus.

PythonMLflowDockerKubernetesGitCI/CDREST APIGCPBayesian modelsPyMCStanGitHub ActionsGitLab CIJenkinsPackagingTestingDocumentation
United States + 2 moreAll locations: United States, Argentina, Venezuela

Senior ML Ops Engineer

Sprout Social

Sprout Social is a global leader in social media management and analytics software. Sprout’s award-winning platform offers intuitive and comprehensive social media management solutions, including publishing and engagement functionality, customer care, influencer marketing, advocacy, and AI-powered, predictive business intelligence. Founded in 2010 and headquartered in Chicago, Sprout has a hybrid team of 1400 people across the globe with offices in Seattle, Dublin and Poland. Sprout Social is consistently recognized as a best place to work with recent accolades from Fortune, Glassdoor, Built In and more.

Machine Learning Engineer11 days ago
Full TimeRemoteTeam 1,400Since 2010

The role involves building and maintaining infrastructure using AWS, Terraform, and Kubernetes to support AI/ML, including Generative AI applications at scale. Responsibilities also include managing the end-to-end lifecycle of machine learning models, ensuring observability, and developing tooling to streamline model development and deployment for AI/ML Scientists.

United States

Staff Machine Learning Engineer: Personalization

PrizePicks

PrizePicks is the fastest-growing sports company in North America according to the 2023 Inc. 5000 rankings, two years running, and the largest independent skill-based fantasy sports operator in the country.

Machine Learning Engineer11 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor

Lead the design and development of real-time machine learning personalization architecture, focusing on user experience and dynamic content discovery.

AWSDatabricksDynamoDBGCPGoKafkaKubeflowMlflowPubsubPythonRedisRustSQL
Georgia
$220K - $280K / year