Senior Software Engineer – AI Middleware

AI EngineerMachine Learning EngineerFull TimeRemoteTeam 51-200

Location

United States

Posted

10 hours ago

Salary

Not specified

CC++LinuxCUDANCCLRCCLGPU DirectRDMALibfabricPy Torch DistributedTensor FlowJAXDeep SpeedMegatron LMProfilingTracingCollective CommunicationNCCL Upstreaming

Job Description

Cornelis Networks delivers the world’s highest performance scale-out networking solutions for AI and HPC datacenters. Our differentiated architecture seamlessly integrates hardware, software and system level technologies to maximize the efficiency of GPU, CPU and accelerator-based compute clusters at any scale. Our solutions drive breakthroughs in AI & HPC workloads, empowering our customers to push the boundaries of innovation. Backed by top-tier venture capital and strategic investors, we are committed to innovation, performance and scalability - solving the world’s most demanding computational challenges with our next-generation networking solutions.


We are a fast-growing, forward-thinking team of architects, engineers, and business professionals with a proven track record of building successful products and companies. As a global organization, our team spans multiple U.S. states and six countries, and we continue to expand with exceptional talent in onsite, hybrid, and fully remote roles.


We are seeking a highly experienced Senior Software Engineer to design, develop, and upstream-enable Cornelis Networks’ AI communication middleware. This role focuses on distributed AI workloads and enabling/optimizing collective communication libraries (e.g., NCCL/RCCL) over Cornelis Networks’ interconnects.


Key Responsibilities

  • Design and implement performance-critical features for CCL enablement on Cornelis Networks’ fabrics.
  • Optimize distributed training performance across multi-node, multi-GPU configurations.
  • Improve GPU communication paths including GPU-direct transfers, IPC, and CPU/GPU synchronization.
  • Profile distributed AI workloads and identify bottlenecks across the software and hardware stack.
  • Tune AI frameworks such as PyTorch Distributed, TensorFlow/XLA, JAX, DeepSpeed, and Megatron-LM.
  • Develop benchmarks and microbenchmarks aligned with real model performance.
  • Contribute upstream to AI communication and distributed training projects.
  • Participate in design reviews, code reviews, CI, and long-term maintenance.
  • Prototype and validate Ultra Ethernet capabilities for AI collective communication.
  • Provide technical input for deployment considerations and performance validation.
  • Collaborate with kernel/driver, switch, performance, and systems teams.
  • Support advanced escalations by analyzing traces and providing robust fixes.


Minimum Qualifications

  • 8+ years of experience in high-performance systems programming in C/C++ on Linux.
  • Strong experience with GPU communication stacks including CUDA/ROCm and NCCL/RCCL.
  • Ability to optimize distributed training performance using profiling and tracing.
  • Understanding of collective communication concepts and topology awareness.
  • Experience delivering production-quality code.
  • Open-source contributions in relevant areas.


Preferred Qualifications

  • Experience with AI frameworks such as PyTorch Distributed, DeepSpeed, and Megatron-LM.
  • Familiarity with libfabric/OFI, UCX, and RDMA concepts.
  • Experience with RoCEv2 and Ultra Ethernet.
  • Experience building cluster-scale performance test infrastructure.


Location: This is a remote position for employees residing within the United States.


We offer a competitive compensation package that includes equity, cash, and incentives, along with health and retirement benefits. Our dynamic, flexible work environment provides the opportunity to collaborate with some of the most influential names in the semiconductor industry.


At Cornelis Networks your base salary is only one component of your comprehensive total rewards package. Your base pay will be determined by factors such as your skills, qualifications, experience, and location relative to the hiring range for the position. Depending on your role, you may also be eligible for performance-based incentives, including an annual bonus or sales incentives. 

In addition to your base pay, you’ll have access to a broad range of benefits, including medical, dental, and vision coverage, as well as disability and life insurance, a dependent care flexible spending account, accidental injury insurance, and pet insurance. We also offer generous paid holidays, 401(k) with company match, and Open Time Off (OTO) for regular full-time exempt employees. Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.


Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. 




Location

Austin, Texas (Remote)


Department

Software Engineering


Employment Type

Full-Time


Job Requirements

  • 8+ years of experience in high-performance systems programming in C/C++ on Linux.
  • Strong experience with GPU communication stacks including CUDA/ROCm and NCCL/RCCL.
  • Ability to optimize distributed training performance using profiling and tracing.
  • Understanding of collective communication concepts and topology awareness.
  • Experience delivering production-quality code.
  • Open-source contributions in relevant areas.
  • Experience with AI frameworks such as PyTorch Distributed, DeepSpeed, and Megatron-LM.
  • Familiarity with libfabric/OFI, UCX, and RDMA concepts.
  • Experience with RoCEv2 and Ultra Ethernet.
  • Experience building cluster-scale performance test infrastructure.

Benefits

  • Competitive compensation package that includes equity, cash, and incentives.
  • Health and retirement benefits.
  • Dynamic, flexible work environment.
  • Access to a broad range of benefits, including medical, dental, and vision coverage.
  • Disability and life insurance.
  • Dependent care flexible spending account.
  • Accidental injury insurance and pet insurance.
  • Generous paid holidays.
  • 401(k) with company match.
  • Open Time Off (OTO) for regular full-time exempt employees.
  • Other paid time off benefits include sick time, bonding leave, and pregnancy disability leave.

Related Job Pages

More AI Engineer Jobs

AI Engineer

Abbott

Abbott is a global healthcare leader that helps people live more fully at all stages of life. Our portfolio of life-changing technologies spans the spectrum of healthcare, with leading businesses and products in diagnostics, medical devices, nutritionals and branded generic medicines. Our 115,000 colleagues serve people in more than 160 countries.

AI Engineer10 hours ago
Full TimeRemoteTeam 10,001+Since 1888H1B Sponsor

AI Engineer developing next generation systems for healthcare analytics

AWSAzureCloudMySQLNumpyPandasPostgreSQLPythonScikit-LearnSQLTensorflow
United States
$99.3K - $198.7K / year

AI Solutions Architect - Unified Communications

Sangoma

Sangoma Technologies is a trusted world leader in value-based Unified Communications & UCaaS solutions.

AI Engineer10 hours ago
Full TimeRemoteTeam 201-500Since 1984H1B Sponsor

AI Solutions Architect - Build it, Ship it, Own itAt Sangoma, we’re embedding AI directly into the core of our UCaaS and CCaaS platforms to create real competitive advantage and real customer impact. We build the technology that keeps businesses connec...

PythonTensorFlowPyTorchKubernetesAWSDockerPostgreSQLKafkagRPCREST APIMicroservicesMachine LearningDeep LearningMLOpsCI/CDMonitoringSecurityScalabilityUnified CommunicationsContact Center as a Service
Texas

AI Solutions Architect - Unified Communications

Sangoma

Sangoma Technologies is a trusted world leader in value-based Unified Communications & UCaaS solutions.

AI Engineer10 hours ago
Full TimeRemoteTeam 201-500Since 1984H1B Sponsor

AI Solutions Architect - Build it, Ship it, Own itAt Sangoma, we’re embedding AI directly into the core of our UCaaS and CCaaS platforms to create real competitive advantage and real customer impact. We build the technology that keeps businesses connec...

PythonMachine LearningTensorFlowPyTorchKubernetesAWSDockerMicroservices ArchitectureREST APIgRPCPostgreSQLRedisKafkaPrometheusGrafanaTerraformCI/CDGitLinux
Georgia

Automotive Engineering & Python Expert - Freelance AI Trainer

Mindrift

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

AI Engineer10 hours ago
Part TimeRemote

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is pr...

Automotive EngineeringPythonNumPySciPyPandasNumerical ValidationEngineering Problem Solving
Kentucky