Principal Engineer
Location
United States
Posted
2 days ago
Salary
Not specified
No structured requirement data.
Job Description
We are looking for a Principal Engineer – AI/ML Platform who will own the architecture, productionization, and operational excellence of our machine learning and LLM infrastructure.This is not a research scientist role.
You will define how GenAI systems are evaluated, deployed, monitored, governed, and continuously improved at scale. You will shape standards across model integration, evaluation frameworks, inference systems, safety mechanisms, telemetry instrumentation, and AI/ML workflow automation.
You will operate at the intersection of AI engineering, distributed systems, and platform architecture—partnering closely with Product and Engineering leadership to ensure our AI systems are reliable, observable, safe, and economically scalable in enterprise production environments.
Required Experience10+ years of software engineering experience with significant recent hands-on AI/ML work
Proven ownership of production AI/ML or LLM systems at scale (not research or prototypes)
Deep expertise in LLM productionization (RAG, finetuning, evaluation, guardrails, model monitoring)
Strong Python expertise
Experience with modern AI frameworks (PyTorch, TensorFlow, JAX, Scikit-learn)
Hands-on AI/MLOps experience (CI/CD for ML, deployment automation, experiment tracking, monitoring)
Experience with cloud platforms (AWS/GCP/Azure), Kubernetes, and distributed systems
Experience implementing evaluation pipelines and observability instrumentation
Demonstrated technical leadership influencing multi-team architectural direction
Experience with ML workflow orchestration platforms (Kubeflow, MLflow, Vertex AI, SageMaker)
Expertise in model governance, bias evaluation, compliance, and drift detection
Domain expertise in NLP, agentic systems, recommender systems, or similar applied AI areas
Open-source AI/ML contributions
Master’s or PhD in ML/AI-related field
Define and own architecture for scalable AI/ML and LLM systems, including:
Inference pipelines
Evaluation frameworks
Model lifecycle workflows
Monitoring and observability systems
Translate ambiguous business requirements into robust AI platform designs and staged delivery plans
Make strategic decisions on:
Model integrations and gateways
Retrieval-augmented generation (RAG) approaches
Evaluation methodologies
Safety and guardrail systems
Establish standards for model readiness, evaluation gates, rollout/rollback mechanisms, and drift detection
Build and deploy production-grade LLM capabilities integrated into distributed systems with clear SLOs and telemetry
Design scalable AI/MLOps and AIOps practices across training, testing, deployment, and monitoring
Improve data pipelines, feature workflows, and lineage processes supporting model evaluation and inference
Instrument tracing and model observability using OpenTelemetry and modern telemetry standards
Own evaluation pipelines tracking latency, cost, accuracy, hallucination rates, and prompt/version drift
Provide clear trade-off analyses balancing model performance, cost efficiency, safety, and maintainability
Write structured technical proposals that guide executive investment and roadmap decisions
Mentor engineers in AI productionization, experimentation discipline, and distributed systems design
Raise the engineering bar through principled reviews, documentation, and mechanism-driven standards
Shape the AI production architecture of a category-defining GenAI infrastructure company
Define how enterprise-grade AI systems are observed, evaluated, and remediated
Build mechanisms that scale beyond individual engineers
Influence roadmap and platform strategy at a formative stage
Fully remote
ESOP equity
Flexible hours
Generous PTO
Global offsites
Education support
Clear advancement opportunities
Related Guides
Related Job Pages
More AI Engineer Jobs
The role involves architecting the agent logic, tool-calling structures, and evaluation loops that power the core creative engine, focusing on designing and refining prompts, workflows, and AI agents for high-quality creative outputs at scale. Responsibilities include collaborating with delivery teams to translate client goals into prompt-based solutions and building reusable prompt frameworks.
AI Red-Teamer - Adversarial AI Testing English
Weekday (YC W21)We are a Y-Combinator-backed startup building your AI-powered Recruiter Agent
This role is for one of our clientsCompensation: $50-$111 per hourWe are seeking AI Red-Teamers to help test and strengthen modern AI systems through adversarial evaluation. In this role, you will challenge AI models with carefully designed inputs to u...
We are seeking an AI-Enabled Senior Full-Stack Engineer to help design, build, and deliver modern software applications across backend systems, frontend applications, and cloud infrastructure. This is a hands-on consulting role where you will collaborate closely with Product Mana...
Senior AI Engineer
eBayWe connect people and build communities to create economic opportunity for all.
This opportunity is for builders who thrive between ambiguity and execution. At eBay, you will help define and deliver the next wave of AI-powered marketplace experiences by turning emerging ideas into measurable outcomes. Success in this role means creating new capabilities that...