The Future of High Performance Fabrics

Principal Software Engineer – AI/HPC Middleware

Full TimeRemoteTeam 51-200H1B SponsorCompany Site LinkedIn

Location

Texas

Posted

7 days ago

Salary

Not specified

12 yrs expEnglishLinux

Job Description

• Lead design and implementation enabling and optimizing HPC middleware (MPI and SHMEM) and AI middleware CCL stacks (e.g., NCCL/RCCL and related collective communication libraries) • Deliver performance-critical communication paths including low-latency small and medium message transfers, bulk SDMA data movement, GPU-Direct and IPC communication, and collective acceleration • Design and tune collective communication algorithms (latency-optimized and bandwidth-optimized), including GPU-aware collectives • Integrate middleware with underlying transports and provider layers such as libfabric/OFI, UCX, and verbs-style interfaces to achieve performance, portability, and maintainability • Implement and optimize memory registration strategies, progress and execution models, completion semantics, multi-rail communication behavior, and GPU memory handling • Drive upstream contributions across MPI/SHMEM projects, CCL ecosystems, and related components with a focus on upstreamable design and long-term maintainability • Represent Cornelis Networks in open-source communities through technical reviews, design discussions, and sustained technical leadership • Implement and prototype Ultra Ethernet capabilities supporting MPI/SHMEM and AI collective communication use cases • Collaborate with ecosystem partners to validate deployment models and performance scaling on customer-relevant configurations • Work closely with kernel, driver, and switch teams to deliver end-to-end performance aligned with the Cornelis product roadmap • Participate in architecture reviews, performance tuning, scaling validation, and multi-layer root-cause investigations • Analyze performance traces and triage advanced customer issues, translating findings into robust fixes and upstream improvements • Publish internal and external best practices, including tuning guidance, reference configurations, and debugging methodologies • Mentor senior engineers and promote best practices for design, testing, documentation, and code quality • Help define the long-term middleware technical roadmap aligned with product evolution and customer needs

Job Requirements

12+ years of experience in high-performance systems programming in C/C++ on Linux
Hands-on experience with MPI internals (Open MPI, MPICH, MVAPICH) and/or SHMEM implementations
Experience implementing or optimizing collective communications for HPC and/or AI workloads, including NCCL/RCCL (CUDA/ROCm) or related CCL stacks
Demonstrated ability to design low-latency/high-throughput communication paths and diagnose performance issues using profiling and tracing tools
Working knowledge of transport and integration layers such as OFI/libfabric, UCX, and verbs-style networking concepts
Strong understanding of RDMA and performance tuning
Proven open-source contribution track record
Demonstrated technical leadership in complex HPC or AI system software.

Benefits

Health and retirement benefits
Generous paid holidays
401(k) with company match
Open Time Off (OTO) for regular full-time exempt employees
Paid time off benefits including sick time, bonding leave, and pregnancy disability leave

Related Categories

Remote Full-stack Engineer Jobs in the US Remote Software Engineer Jobs in the US Remote Backend Engineer Jobs in the US Frontend Engineer Android Engineer Game Engineer iOS Engineer

Related Job Pages

Remote Full-stack Engineer Jobs in the US Full-stack Engineer Jobs in Texas Remote Full-time Jobs (US)More US Remote Jobs