ML Platform Engineer (Senior)
Location
United States
Posted
23 days ago
Salary
Not specified
Job Description
Job Requirements
- 3+
- years of experience in ML engineering or a similar role building and deploying machine learning models in production.
- Strong experience with
- AWS ML services
- (SageMaker, Lambda, EMR, ECR) for training, serving, and orchestrating model workflows.
- Hands-on experience with
- Kubernetes
- (e.g., EKS) for container orchestration and job execution at scale.
- Strong proficiency in Python, with exposure to ML/DL libraries such as TensorFlow, PyTorch, scikit-learn.
- Experience working with
- feature stores
- , data pipelines, and model versioning tools (e.g., SageMaker Feature Store, Feast, MLflow).
- Familiarity with infrastructure-as-code and deployment tools such as
- Terraform, GitHub Actions, or similar CI/CD systems.
- Experience with logging and monitoring stacks such as Prometheus, Grafana, CloudWatch, or similar.
- Experience working in cross-functional teams with data scientists and DevOps engineers to bring models from research to production.
- Strong communication skills and ability to operate effectively in a fast-paced, ambiguous environment with shifting priorities.
Related Guides
Related Categories
Related Job Pages
More Platform Engineer Jobs
Design and implement high-performance backend systems, optimize scalability, enhance security, and collaborate across teams. Engage with customers for support.
Platform Operations Engineer - US Remote
PerfectServeAccelerating speed to care by optimizing provider schedules, streamlining clinical communication, and engaging patients.
The Senior Systems Engineer will develop and implement cloud capabilities, manage SaaS platforms, and ensure security and performance of systems.
Lead the design and architecture of AI agent systems for monetization, develop machine learning models, and mentor junior engineers.
The Senior Platform Engineer will manage and optimize a fleet of media players, working on software deployment, troubleshooting network issues, and improving team processes, all while collaborating with technical and non-technical stakeholders.