Cornelis Networks
The Future of High Performance Fabrics
Principal Software Engineer – AI/HPC Middleware
Location
Texas
Posted
7 days ago
Salary
Not specified
12 yrs expEnglishLinux
Job Description
• Lead design and implementation enabling and optimizing HPC middleware (MPI and SHMEM) and AI middleware CCL stacks (e.g., NCCL/RCCL and related collective communication libraries)
• Deliver performance-critical communication paths including low-latency small and medium message transfers, bulk SDMA data movement, GPU-Direct and IPC communication, and collective acceleration
• Design and tune collective communication algorithms (latency-optimized and bandwidth-optimized), including GPU-aware collectives
• Integrate middleware with underlying transports and provider layers such as libfabric/OFI, UCX, and verbs-style interfaces to achieve performance, portability, and maintainability
• Implement and optimize memory registration strategies, progress and execution models, completion semantics, multi-rail communication behavior, and GPU memory handling
• Drive upstream contributions across MPI/SHMEM projects, CCL ecosystems, and related components with a focus on upstreamable design and long-term maintainability
• Represent Cornelis Networks in open-source communities through technical reviews, design discussions, and sustained technical leadership
• Implement and prototype Ultra Ethernet capabilities supporting MPI/SHMEM and AI collective communication use cases
• Collaborate with ecosystem partners to validate deployment models and performance scaling on customer-relevant configurations
• Work closely with kernel, driver, and switch teams to deliver end-to-end performance aligned with the Cornelis product roadmap
• Participate in architecture reviews, performance tuning, scaling validation, and multi-layer root-cause investigations
• Analyze performance traces and triage advanced customer issues, translating findings into robust fixes and upstream improvements
• Publish internal and external best practices, including tuning guidance, reference configurations, and debugging methodologies
• Mentor senior engineers and promote best practices for design, testing, documentation, and code quality
• Help define the long-term middleware technical roadmap aligned with product evolution and customer needs
Job Requirements
- 12+ years of experience in high-performance systems programming in C/C++ on Linux
- Hands-on experience with MPI internals (Open MPI, MPICH, MVAPICH) and/or SHMEM implementations
- Experience implementing or optimizing collective communications for HPC and/or AI workloads, including NCCL/RCCL (CUDA/ROCm) or related CCL stacks
- Demonstrated ability to design low-latency/high-throughput communication paths and diagnose performance issues using profiling and tracing tools
- Working knowledge of transport and integration layers such as OFI/libfabric, UCX, and verbs-style networking concepts
- Strong understanding of RDMA and performance tuning
- Proven open-source contribution track record
- Demonstrated technical leadership in complex HPC or AI system software.
Benefits
- Health and retirement benefits
- Generous paid holidays
- 401(k) with company match
- Open Time Off (OTO) for regular full-time exempt employees
- Paid time off benefits including sick time, bonding leave, and pregnancy disability leave