NVIDIA

NVIDIA is widely considered to be one of the world’s most desirable employers! NVIDIA has some of the most forward-thinking and hardworking people in the world working together to advance accelerated computing and AI. If you are a creative and autonomous technologist with a passion for working with developers and shaping the future of AI in the public sector, we want to hear from you! Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD. Applications for this job will be accepted at least until March 21, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Senior System Software Engineer, NCCL – Partner Enablement

Full-stack EngineerSoftware EngineerFull TimeRemoteSeniorTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 1 moreAll locations: California, Texas

Posted

62 days ago

Salary

$152K - $218.5K / year

Seniority

Senior

Bachelor Degree5 yrs expEnglishAnsibleAWSAzureCloudDockerGoogle Cloud PlatformKubernetesLinuxNode.jsPython

Job Description

• Engage with our partners and customers to root cause functional and performance issues reported with NCCL • Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters • Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.) • Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters • Document and conduct trainings/webinars for NCCL • Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support.

Job Requirements

  • B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience.
  • Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
  • Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
  • Experience working with engineering or academic research community supporting HPC or AI
  • Practical experience with high performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control
  • Expert in Linux fundamentals and a scripting language, preferably Python
  • Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
  • Adaptability and passion to learn new areas and tools
  • Flexibility to work and communicate effectively across different teams and timezones

Benefits

  • Equity
  • Benefits

Related Job Pages

More Full-stack Engineer Jobs

Full Stack Engineer

Fieldwire by Hilti

The all-in-one jobsite management software for field to office communication.

Full-stack Engineer62 days ago
Full TimeRemoteTeam 51-200Since 2013H1B No Sponsor

Mid-Level Fullstack Engineer developing core features for construction management platform

AngularBootstrapRubyRuby on RailsRustSCSS
United States
$145K - $170K / year

Software Engineer – Support Experience

SeatGeek

Help the world experience more live.

Full-stack Engineer62 days ago
Full TimeRemoteTeam 501-1,000Since 2009H1B Sponsor

Software Engineer developing ticketing solutions at SeatGeek

United States
$121K - $175K / year

Software Engineer I, Fullstack, Risk Engineering

Flex

Flex splits your bills into smaller, stress-free payments throughout the month. Start today with your rent bill!

Full-stack Engineer62 days ago
Full TimeRemoteTeam 201-500Since 2019H1B Sponsor

Software Engineer I developing backend services and APIs for Flex's risk engineering systems

Distributed SystemsJavaReactReact NativeSpringSQLTypeScript
California + 2 moreAll locations: California, New Jersey, New York
$125K - $138K / year

Full-Stack Developer

HOLYWATER

We publish stories that inspire millions of people around the world

Full-stack Engineer62 days ago
Full TimeRemoteTeam 51-200Since 2020H1B No Sponsor

Full-Stack Developer at HOLYWATER creating AI-based entertainment products

AWSFirebaseGoogle Cloud PlatformJavaScriptNext.jsNode.jsReactTypeScript
United States