Specialty Solutions Engineer, AI - Modern Data Center

Solutions EngineerSolutions EngineerFull TimeRemoteTeam 1,154Since 2007Company Site

Location

United States

Posted

43 days ago

Salary

$250K - $300K / year

NVIDIA GPUAMD InstinctCUDATensor RTTriton Inference ServerPy TorchTensor FlowKubernetesInfini BandNvlinkAWS Sage MakerAzure MLGCP Vertex AIMlflowKubeflowSLURMDockerGPU ClustersData Center ArchitecturePre Sales

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

The AHEAD Senior Modern Datacenter Specialist Solutions Engineer (SSE) is a presales technical leader focused on designing, positioning, and enabling enterprise-scale AI infrastructure solutions. This role owns the technical sales motion from discovery through architecture, demonstration, and business case, across the full AI stack: GPU platforms (NVIDIA & AMD), interconnects, orchestration/scheduling, and AI/ML frameworks. The SSE partners closely with Client Directors, Practice Leaders, and strategic vendors to develop market strategies, drive pipeline, and win campaigns in the US enterprise market.

Lead technical discovery and solution positioning for enterprise AI infrastructure opportunities, translating business outcomes into reference architectures and value propositions.
Own pre-sales deliverables including architectures, diagrams, sizing, BOMs, proposals.
Deliver executive and technical presentations focused on NVIDIA AI Enterprise (NVAIE), LLM training/inference, and accelerated analytics.
Guide clients through technology selection, roadmap development, and business case creation for large-scale AI initiatives.
Architect end-to-end AI platforms using NVIDIA DGX/HGX, Blackwell (B100/B200), Hopper (H100/H200), Grace/Grace-Hopper (GH200), L40S, NVLink/NVSwitch, InfiniBand (NVIDIA Quantum), RoCE, and DPU offload patterns.
Design solutions leveraging AMD Instinct (MI300/MI300X) as appropriate, articulating trade-offs in CPU/GPU/DPU, interconnect topology, and cluster scale-out.
Integrate NVIDIA AI Enterprise components (CUDA, cuDNN, TensorRT, Triton Inference Server, RAPIDS) and common ML frameworks (PyTorch, TensorFlow) with orchestration platforms.
Experience integrating on-prem GPU clusters with cloud AI services (AWS SageMaker, Azure ML, GCP Vertex AI) for hybrid bursting and workload mobility.
Advise on MLOps platforms (MLflow, Kubeflow, Weights & Biases), CI/CD, and governance for multi-tenant AI environments.
Build and maintain relationships with NVIDIA, AMD, Run:AI, OEMs, and networking vendors, aligning campaigns with partner programs and incentives.
Contribute feedback to vendor engineering and product teams, coordinating joint enablement and reference designs.
Create repeatable assets such as validated designs, sizing calculators, POV guides, deployment runbooks, and competitive playbooks.
Mentor SEs and delivery consultants, leading internal training on AI scheduling, performance tuning, and operational best practices.
Lead proof-of-value (POV) and proof-of-concept (POC) engagements, including success criteria, benchmarking, and recommendations.

Qualifications

Proven experience architecting and deploying NVIDIA GPU-based AI platforms (NVAIE, DGX/HGX, Blackwell, Hopper, Grace, L40S, H100/H200, B100/B200, GH200) and/or AMD Instinct MI300/MI300X.
Experience with Run:AI, NVIDIA Base Command, Kubernetes (GPU Operator), Slurm, and/or vSphere with Tanzu for AI/ML workloads.
Advanced knowledge of AI/ML frameworks and libraries (PyTorch, TensorFlow, RAPIDS, Triton, CUDA, cuDNN, TensorRT).
Strong understanding of high-speed networking for AI (InfiniBand, RoCE, DPU integration, NVLink, NVSwitch).
Experience integrating on-prem AI infrastructure with public cloud AI services (AWS SageMaker, Azure ML, GCP Vertex AI) and hybrid architectures.
Experience leading pre-sales campaigns, POV/POC management, and executive presentations.
Ability to identify and leverage emerging datacenter and AI technologies to drive innovative solutions.
Strong analytical skills for troubleshooting complex environments, including storage, compute, networking, and AI workloads.
Skilled at guiding clients through decision-making with clear, strategic recommendations.
Proven track record of working effectively across sales, engineering, and vendor teams.
Knowledge of datacenter security best practices and regulatory compliance.

Requirements

Bachelor’s degree in computer science, Engineering, or related field; advanced degree a plus.
7+ years in datacenter or cloud infrastructure, with 3+ years directly on AI/GPU platforms and orchestration.

Certifications (preferred)

NVIDIA Certified Systems Engineer / AI Practitioner; NVAIE enablement badges
Kubernetes CKA/CKS; GPU Operator experience
AWS/Azure/GCP Architect; relevant AI/ML specialty

Benefits

Medical, Dental, and Vision Insurance
401(k)
Paid company holidays
Paid time off
Paid parental and caregiver leave
Plus more! See benefits here for additional details.

Job Requirements

Proven experience architecting and deploying NVIDIA GPU-based AI platforms (NVAIE, DGX/HGX, Blackwell, Hopper, Grace, L40S, H100/H200, B100/B200, GH200) and/or AMD Instinct MI300/MI300X.
Experience with Run:AI, NVIDIA Base Command, Kubernetes (GPU Operator), Slurm, and/or vSphere with Tanzu for AI/ML workloads.
Advanced knowledge of AI/ML frameworks and libraries (PyTorch, TensorFlow, RAPIDS, Triton, CUDA, cuDNN, TensorRT).
Strong understanding of high-speed networking for AI (InfiniBand, RoCE, DPU integration, NVLink, NVSwitch).
Experience integrating on-prem AI infrastructure with public cloud AI services (AWS SageMaker, Azure ML, GCP Vertex AI) and hybrid architectures.
Experience leading pre-sales campaigns, POV/POC management, and executive presentations.
Ability to identify and leverage emerging datacenter and AI technologies to drive innovative solutions.
Strong analytical skills for troubleshooting complex environments, including storage, compute, networking, and AI workloads.
Skilled at guiding clients through decision-making with clear, strategic recommendations.
Proven track record of working effectively across sales, engineering, and vendor teams.
Knowledge of datacenter security best practices and regulatory compliance.
Bachelor’s degree in computer science, Engineering, or related field; advanced degree a plus.
7+ years in datacenter or cloud infrastructure, with 3+ years directly on AI/GPU platforms and orchestration.
Certifications (preferred)
NVIDIA Certified Systems Engineer / AI Practitioner; NVAIE enablement badges
Kubernetes CKA/CKS; GPU Operator experience
AWS/Azure/GCP Architect; relevant AI/ML specialty