Principal / Senior Data Engineer

Data EngineerData EngineerFull TimeRemote

Location

United States

Posted

3 days ago

Salary

Not specified

No structured requirement data.

Job Description

Troveo is the largest licensable video library for AI model training. We partner with thousands of content licensors—ranging from top-tier studios and production houses to leading YouTube creators—to supply video content to the world’s foremost research labs. Our mission is to rapidly deliver massive volumes of video content, to exact specifications, fueling next-generation generative and world-understanding AI models.

Data Engineering is central to our success. Each week, we process petabytes of video data—quickly, cost-effectively, and with uncompromising quality. As a data engineer at Troveo, you’ll focus on:

  • Lowering costs and reducing turnaround times for processing content.

  • Enhancing and transforming video data for our customers, to make it easier to discover and more valuable.

We are seeking a Principal or Senior Data Engineer with demonstrated expertise in Python and large-scale data management. Practical experience with AWS services (S3, EC2, etc.), search, and large databases is essential. Familiarity with video data is a plus, but not required.

Responsibilities

  • Data Pipeline Development: Design, build, and maintain scalable, efficient data pipelines in Python.

  • AWS Ecosystem: Leverage services like S3 for data storage (including multiple tiers of storage) and EC2 for compute (currently running clusters of 50k G instances), retrieval, and processing in production environments.

  • Big Data Handling: Develop and optimize systems to handle petabyte-scale datasets with a focus on performance, reliability, and cost-effectiveness.

  • Metadata Generation: Leveraging self-hosted open source LLMs and managed APIs to generate reliable metadata to power discovery and enhance the value of the content we deliver.

  • Discovery: Building from the ground up search capabilities leveraging visual, semantic and taxonomic data to deliver the right content to our customers.

  • Monitoring & Reliability: Implement robust monitoring, alerting, and logging to ensure smooth data flow and quickly troubleshoot issues.

  • Collaboration: Work cross-functionally with data scientists, software engineers, and product teams to understand data needs and deliver optimized solutions.

  • Video Processing (Preferred): If applicable, process and manage video data for analytics, quality control, and other use cases.

Required Qualifications

  • Python Proficiency: Strong coding skills in Python (including familiarity with libraries for data manipulation and analysis).

  • AWS Expertise: Hands-on experience using core AWS services (S3, EC2, possibly Lambda, EMR, or ECS).

  • Big Data Skills: Demonstrated ability to work with large-scale datasets (petabyte-level), ensuring high performance and scalability.

  • Database & Storage: Familiarity with large Postgres databases.

  • Automation & Scripting: Comfortable building CI/CD pipelines and automating repetitive tasks.

Nice to Have

  • Video Processing: Experience handling or transforming video data (e.g., transcoding, extracting metadata, compiling FFMPEG).

  • Machine Learning Pipelines: Familiarity with ML and Computer Vision workflows or frameworks (OpenCV, TensorFlow, PyTorch, etc.).

  • Security Best Practices: Understanding of AWS IAM, encryption, and SOC II compliance standards.

What We Offer

  • An opportunity to work with massive data sets and cutting-edge technologies in the cloud serving the biggest companies in tech building the next generation of AI models

  • A collaborative environment with a talented, diverse team of engineers and data experts.

  • Competitive compensation and benefits with room for career growth and professional development.

  • This job is remote/work from home with the option of meeting up from time to time if you are located in the SF Bay Area.

Related Categories

Related Job Pages

More Data Engineer Jobs

Full TimeRemote

This role will define the data strategy, standards, and blueprints for migrating and integrating high-volume operational data into a new SAP Cloud ERP platform, ensuring data integrity, security, and accessibility across the enterprise. The work supports key institutional initiatives like enabling real-time analytics, AI/ML capabilities, and enhanced operational efficiency.

United States
Full TimeRemote

The Data Product Engineer will design, develop, and automate custom data processes using workflow tools to create FAIR Health data products, coordinating efforts across multiple teams for optimization. This role involves translating business needs into development requirements, debugging stored procedures, and overseeing the implementation of data product enhancements.

United States
$80K - $105K / year

Senior Data Engineer – Enterprise Data Storage and Consumption Platforms

Capital One

At Capital One, we think and work like a tech company, using our digital fluency to transform everything about the customer experience. We’re bending data to our will, and turning a stodgy industry on its head. That’s reflected in our ranking as the number one business technology innovator in the U.S. in the 2016 InformationWeek Elite 100.

Data Engineer3 days ago
Full TimeRemoteTeam 10,001+Since 1994H1B Sponsor

Distinguished Data Engineer driving technical strategy for Capital One

AWSAzureCassandraCloudDistributed SystemsETLGoogle Cloud PlatformHadoopHBaseJavaLinuxMapReduceNoSQLPythonScalaShell ScriptingSQLUnix
Virginia
$286.2K - $326.7K / year
Full TimeRemoteTeam 51-200

The role involves developing data interactions using SQL, evolving existing data architecture towards modern designs like a data lake, and collaborating with architects to fulfill user stories for web and Government Cloud applications. Responsibilities also include resolving emergent data issues and leading/assisting in the development of data components across the ETL spectrum, focusing on performance and resilience optimization.

United States
$120K - $143K / year