Senior Platform Telemetry Engineer

EngineerEngineerFull TimeRemoteTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California

Posted

175 days ago

Salary

$148K - $287.5K / year

Bachelor Degree5 yrs expExperience acceptedEnglishGrafanaPrometheusPython

Job Description

• Drive next generation fleet management solutions for scaling AI infrastructure using GPUs and Grace solution from NVIDIA • Work with customers, product management and other architects to narrow down on requirements for implementation • Design architecture for fleet health monitoring and fault-remediation solution at scale • Work with customers and other architects to understand health monitoring requirements and leverage in-band and out-of-band capabilities • Create detailed architecture and perform POCs to validate architecture • Educate customers about product architecture and incorporate feedback • Write architecture specs and design documents; own end-to-end delivery across teams • Perform code reviews for code produced from architecture specs • Ensure product is properly tested; enhance unit testing and establish proper test plans • Drive product life cycles with QA teams to productize code and act as product owner • Articulate requirements in Jira and bug management tools and coordinate execution plans with managers • Contribute to all phases of product development: definition, architecture, design, implementation, debugging, testing, and early customer support

Job Requirements

  • BS, MS, or PhD in EE/CS or related field of education (or equivalent experience)
  • 5+ years hands-on coding experience
  • Strong knowledge of time series databases like Influxdb & Prometheus
  • Strong knowledge of building and consuming REST APIs (Redfish is big plus)
  • Strong knowledge of telemetry visualization solutions like Grafana & Influx
  • Strong knowledge of firmware architecture, optimize firmware for low latency APIs
  • Strong knowledge of analyzing algorithms for time & space complexity and project system resource requirements
  • Proven record of solutions for scalability
  • Strong and demonstrable skill in C/C++ and Python
  • Experience programming and debugging skills for server platforms
  • Experience in SCM (e.g., Git, Perforce) and project management tools like Jira
  • Excellent written and oral communication skills
  • Excellent work ethics, teamwork, and commitment to finishing tasks
  • Self-starter with hands-on coding ability
  • Ways to stand out: Experience building telemetry collection & analysis engines; Experience with Redfish; Experience with notification systems like PagerDuty; Active OCP and DMTF contribution; Hands on with x86 or ARM system architecture; Familiarity with Confidential Compute; Experience with ML and multi-variable optimization techniques

Benefits

  • Eligible for equity
  • Benefits (unspecified)

Related Categories

Related Job Pages

More Engineer Jobs

Senior Managed Services Engineer

Lucidworks

Leaders in AI-Powered Search

Engineer177 days ago
Full TimeRemoteTeam 201-500H1B Sponsor

Senior engineer maintaining Lucidworks Fusion search platforms for enterprise customers

AWSAzureCloudGoogle Cloud PlatformJavaJavaScriptKubernetesPythonSpark
United States
$120K - $165K / year
Engineer177 days ago
Full TimeRemoteTeam 10,001+Since 1888H1B Sponsor

Engineer responsible for substation drawing packages and fault analysis

RPA
Michigan
Full TimeRemoteTeam 10,001+Since 1993H1B Sponsor

Senior engineer building GPU-accelerated sparse linear algebra libraries at NVIDIA.

California
$148K - $287.5K / year

Engineer III

PACS

Healthcare Leadership Careers • Administrative Support Services • Management Advisory and Consulting

Engineer177 days ago
Full TimeRemoteTeam 201-500H1B Sponsor

Engineer III implementing and maintaining healthcare information systems

AWSAzureCitrixCloudVMware
California