Red Hat

The leading provider of enterprise open source solutions.

Forward Deployed Engineer, AI Inference, vLLM, Kubernetes

Full TimeRemoteTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 3 moreAll locations: California, New York, Massachusetts, Washington

Posted

14 days ago

Salary

$184.9K - $305.1K / year

8 yrs expEnglishCloudKubernetesPythonTerraformGo

Job Description

• Orchestrate Distributed Inference: Deploy and configure LLM-D and vLLM on Kubernetes clusters. • Optimize for Production: Go beyond standard deployments by running performance benchmarks, tuning vLLM parameters, and configuring intelligent inference routing policies to meet SLOs for latency and throughput. • Code Side-by-Side: Work directly with customer engineers to write production-quality code (Python/Go/YAML) that integrates our inference engine into their existing Kubernetes ecosystem. • Solve the "Unsolvable": Debug complex interaction effects between specific model architectures (e.g., MoE, large context windows), hardware accelerators (NVIDIA GPUs, AMD GPUs, TPUs), and Kubernetes networking (Envoy/ISTIO). • Feedback Loop: Act as the "Customer Zero" for our core engineering teams. You will channel field learnings back to product development, influencing the roadmap for LLM-D and vLLM features. • Travel only as needed to customers to present, demo, or help execute proof-of-concepts.

Job Requirements

  • 8+ Years of Engineering Experience: You have a decade-long track record in Backend Systems, SRE, or Infrastructure Engineering.
  • Customer Fluency: You speak both "Systems Engineering" and "Business Value".
  • Bias for Action: You prefer rapid prototyping and iteration over theoretical perfection. You are comfortable operating in ambiguity and taking ownership of the outcome.
  • Deep Kubernetes Expertise: You are fluent in K8s primitives, from defining custom resources (CRDs, Operators, Controllers) to configuring modern ingress via the Gateway API.
  • AI Inference Proficiency: You understand how a LLM forward pass works. You know what KV Caching is, why prefill/decode disaggregation matters, why context length impacts performance, and how continuous batching works in vLLM.
  • Systems Programming: Proficiency in Python (for model interfaces) and Go (for Kubernetes controllers/scheduler logic).
  • Infrastructure as Code: Experience with Helm, Terraform, or similar tools for reproducible deployments.
  • Cloud & GPU Hardware Fluency: You are comfortable spinning up clusters and deploying LLMs on bare-metal and hyperscaler Kubernetes clusters.

Benefits

  • Comprehensive medical, dental, and vision coverage
  • Flexible Spending Account - healthcare and dependent care
  • Health Savings Account - high deductible medical plan
  • Retirement 401(k) with employer match
  • Paid time off and holidays
  • Paid parental leave plans for all new parents
  • Leave benefits including disability, paid family medical leave, and paid military leave
  • Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!

Related Job Pages