Senior DGX Cloud Performance Engineer

EngineerEngineerFull TimeRemoteTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 2 moreAll locations: California, Texas, Washington

Posted

47 days ago

Salary

$152K - $287.5K / year

Bachelor Degree5 yrs expEnglishAWSAzureCloudGoogle Cloud PlatformPythonPy TorchTensorflow

Job Description

• Develop benchmarks, end to end customer applications running at scale, instrumented for performance measurements, tracking, sampling, to measure and optimize performance of important applications and services; • Construct carefully designed experiments to analyze, study and develop critical insights into performance bottlenecks, dependencies, from an end to end perspective; • Develop ideas on how to improve the end to end system performance and usability by driving changes in the HW or SW (or both). • Collaborate with AI researchers, developers, and application service providers to understand internal developer and external customer pain points, requirements, project future needs and share best practice. • Develop the necessary modeling framework and the TCO (total cost of ownership) analysis to enable efficient exploration and sweep of the architecture and design space • Develop the methodology needed to drive the engineering analysis to Inform the architecture, design and roadmap of DGX Cloud

Job Requirements

  • Expertise in working with large scale parallel and distributed accelerator-based system systems
  • Expertise optimizing performance and AI workloads on large scale systems
  • Experience with performance modeling and benchmarking at scale
  • Strong background in Computer Architecture, Networking, Storage systems, Accelerators
  • Familiarity with popular AI frameworks (PyTorch, TensorFlow, JAX, Megatron-LM, Tensort-LLM, VLLM) among others
  • Experience with AI/ML models and workloads, in particular LLMs as well as an understanding of DNNs and their use in emerging AI/ML applications and services
  • Bachelors/Masters in Engineering or equivalent experience (preferably, Electrical Engineering, Computer Engineering, or Computer Science)
  • 5+ years experience in the above areas
  • Proficiency in Python, C/C++
  • Expertise with at least one of public CSP infrastructure (GCP, AWS, Azure, OCI, …)

Benefits

  • equity
  • benefits

Related Categories

Related Job Pages

More Engineer Jobs

Engineer49 days ago
Full TimeRemoteTeam 5,001-10,000H1B No Sponsor

BI Engineer II designing and maintaining business intelligence systems for healthcare.

AzureETLSDLCSQL
United States
$84.5K - $126K / year
Full TimeRemoteTeam 10,001+H1B Sponsor

Engineered Solutions Specialist promoting stormwater products for civil sitework projects

Colorado
$5.3K - $11.5K / month

Fire Protection Engineer

Leidos

Leidos is an innovation company rapidly addressing the world’s most vexing challenges in national security and health.

Engineer50 days ago
Full TimeRemoteTeam 10,001+Since 1969H1B Sponsor

Fire Protection Engineer supporting FAA's Fire Life Safety Program

Illinois + 3 moreAll locations: Illinois, Oklahoma, Massachusetts, Texas
$110K - $124.6K / year
Full TimeRemoteTeam 11-50H1B No Sponsor

Senior Endpoint Engineer focused on Active Directory and Intune policies at Cencora

AzureCloudJamf
Pennsylvania