AI Inference Engineer QVAC

AI EngineerMachine Learning EngineerFull TimeRemoteTeam 201-500

Location

United States + 144 moreAll locations: United States, Canada, Brazil, Colombia, Argentina, Chile, Venezuela, Bolivarian Republic Of, Bolivia, Plurinational State Of, Ecuador, French Guiana, Guyana, Paraguay, Peru, Suriname, Uruguay, Mexico, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Dominican Republic, Puerto Rico, Bahamas, Guadeloupe, Haiti, Jamaica, Martinique, Montserrat, United Kingdom, Germany, France, Estonia, Portugal, Hungary, Poland, Ukraine, Romania, Bulgaria, Czech Republic, Slovakia, Belarus, Moldova, Republic Of, Sweden, Greece, Belgium, Italy, Ireland, Switzerland, Netherlands, Finland, Malta, Denmark, Lithuania, Croatia, Spain, Austria, Bosnia And Herzegovina, Iceland, Luxembourg, Macedonia, The Former Yugoslav Republic Of, Montenegro, Norway, Serbia, Slovenia, Albania, Cyprus, Latvia, Monaco, South Africa, Egypt, Algeria, Angola, Benin, Botswana, Burkina Faso, Burundi, Cameroon, Cape Verde, Central African Republic, Chad, Congo, Côte D'ivoire, Congo, The Democratic Republic Of The, Equatorial Guinea, Eritrea, Ethiopia, Gabon, Gambia, Ghana, Guinea, Guinea-bissau, Kenya, Lesotho, Liberia, Libyan Arab Jamahiriya, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mayotte, Morocco, Mozambique, Namibia, Niger, Nigeria, Réunion, Rwanda, Senegal, Seychelles, Sierra Leone, Somalia, Sudan, Swaziland, Tanzania, United Republic Of, Togo, Tunisia, Uganda, Zambia, Zimbabwe, Georgia, Turkey, Israel, United Arab Emirates, Armenia, Azerbaijan, Bahrain, Iraq, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Palestinian Territory, Occupied, Yemen

Posted

3 days ago

Salary

Not specified

C++Java ScriptLlama.cppGgmlONNXDeep LearningTransformersLlmsDiffusion Models

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

You will own the inference backbone behind QVAC's local AI stack: the C++ systems layer that makes models run fast, reliably, and predictably on real user hardware. The role is centered on engineering quality at runtime level, including:

  • Startup behavior
  • Memory pressure
  • Throughput/latency balance
  • Long-session stability

You will define and evolve the core abstractions that inference features depend on, enabling new capabilities to be added without sacrificing performance or maintainability. This role is for someone who enjoys low-level problem solving, clear technical ownership, and building infrastructure that other teams trust in production. Your work directly enables private, on-device AI experiences and helps set the technical foundation for QVAC's next generation of peer-to-peer AI products.

Responsibilities

  • Work on deploying machine learning models to edge devices using the frameworks: llama.cpp, ggml, onnx
  • Collaborate closely with researchers to assist in coding, training and transitioning models from research to production environments
  • Integrate AI features into existing products, enriching them with the latest advancements in machine learning

Qualifications

  • Excellent programming skills in C++, experience in Javascript is a bonus
  • Strong experience with Llama.cpp and ggml inference engines, facilitating the deployment of models to specific GPU architectures
  • Good understanding of deep learning concepts and model architectures
  • Experience with transformers, LLMs, Diffusion models
  • Demonstrated ability to rapidly assimilate new technologies and techniques
  • A degree in Computer Science, AI, Machine Learning, or a related field, complemented by a solid track record in AI R&D

Important information for candidates

Recruitment scams have become increasingly common. To protect yourself, please keep the following in mind when applying for roles:

  • Apply only through our official channels.
  • We do not use third-party platforms or agencies for recruitment unless clearly stated. All open roles are listed on our official careers page: https://tether.recruitee.com/
  • Verify the recruiter’s identity. All our recruiters have verified LinkedIn profiles.
  • Be cautious of unusual communication methods. We do not conduct interviews over WhatsApp, Telegram, or SMS.
  • Double-check email addresses. All communication from us will come from emails ending in @tether.to or @tether.io.
  • We will never request payment or financial details. If someone asks for personal financial information or payment at any point during the hiring process, it is a scam. Please report it immediately.

Job Requirements

  • Excellent programming skills in C++, experience in Javascript is a bonus
  • Strong experience with Llama.cpp and ggml inference engines, facilitating the deployment of models to specific GPU architectures
  • Good understanding of deep learning concepts and model architectures
  • Experience with transformers, LLMs, Diffusion models
  • Demonstrated ability to rapidly assimilate new technologies and techniques
  • A degree in Computer Science, AI, Machine Learning, or a related field, complemented by a solid track record in AI R&D
  • Important information for candidates
  • Recruitment scams have become increasingly common. To protect yourself, please keep the following in mind when applying for roles:
  • Apply only through our official channels.
  • We do not use third-party platforms or agencies for recruitment unless clearly stated. All open roles are listed on our official careers page: https://tether.recruitee.com/
  • Verify the recruiter’s identity. All our recruiters have verified LinkedIn profiles.
  • Be cautious of unusual communication methods. We do not conduct interviews over WhatsApp, Telegram, or SMS.
  • Double-check email addresses. All communication from us will come from emails ending in @tether.to or @tether.io.
  • We will never request payment or financial details. If someone asks for personal financial information or payment at any point during the hiring process, it is a scam. Please report it immediately.

Related Job Pages

More AI Engineer Jobs

Staff Software Engineer, AI

Lattice

Lattice is a people success platform that empowers leaders to build engaged, high-performing teams & winning cultures.

AI Engineer3 days ago
Full TimeRemoteTeam 501-1,000Since 2015H1B Sponsor

This Staff-level role shapes the foundations that determine AI quality, reliability, and impact at scale. You will architect and scale the infrastructure that powers AI quality, reliability, and reuse across Lattice. Design and scale an end-to-end AI evaluation framework spanning...

PythonLLMRAGLangGraphLangSmithPineconeAWSCI/CDTypeScriptMLflowDataDog
United States
Full TimeRemoteTeam 501-1,000

This role involves leading the identification, validation, and adoption of artificial intelligence solutions across the engineering software portfolio for customers. Key contributions include evaluating AI solutions for engineering environments, connecting AI capabilities to business challenges, and building adoption roadmaps.

CADPDMPLMEngineering WorkflowsPre-salesSoftware ImplementationLLMGenerative AIMachine LearningAutomation
United States
$85K - $115K / year

Generative AI Specialist - Humanities (English and Japanese)

Innodata Inc

Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are an AI technology solutions provider-of-choice for 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine. By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-security infrastructure, we’re helping usher in the promise of AI. Our global workforce includes over 7,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years.

AI Engineer3 days ago
Full TimeRemoteTeam 5,001-10,000

Core tasks involve evaluating, annotating, classifying, and augmenting data to help Large Language Models (LLMs) learn language intricacies and reasoning. This includes generating prompts, rewriting responses, summarizing content, and translating between English and Japanese.

United States
Full TimeRemoteTeam 10,001

The Principal Software Engineer will architect and implement advanced Generative AI and RAG systems to power intelligent recommendations and personalization, while leading the rapid prototyping and full-cycle development of production-grade features for digital platforms. This role also involves establishing scalable platform standards, engineering patterns, and reusable components across the organization.

Generative AIRAGReactNext.jsNode.jsJavaScriptTypeScriptHTMLCSSREST APIGraphQLApollo ClientAWSDockerKubernetesRedisKafkaOpenTelemetryHTML/CSS
United States