BLACKBIRD.AI

Deception Detection for the Information Age.

Staff Data Engineer

Data EngineerData EngineerFull TimeRemoteTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

New York + 2 moreAll locations: New York, Texas, Washington

Posted

38 days ago

Salary

$160K - $190K / year

8 yrs expEnglishApacheAWSAzureCloudElastic SearchPythonSparkSQL

Job Description

• Design and implement scalable data platform architecture on Databricks, supporting both batch and streaming ingestion • Build robust, fault-tolerant data ingestion pipelines that integrate with multiple third-party APIs and data providers • Design and implement AI-powered enrichment stages within pipelines—applying ML clustering, generative AI summarization, classification, and entity extraction to transform raw data into actionable intelligence • Build analytical systems with full-text search capabilities using Elasticsearch for rapid querying and analysis of enriched data • Work with AI/ML researchers to implement, integrate and scaling AI processing • Expose data platform capabilities as APIs and other interfaces for downstream consumption by applications and services • Optimize data lake and lakehouse architecture for performance, cost-efficiency, and scalability • Design and implement data quality frameworks, monitoring, and alerting systems • Design efficient architectures for calling external AI APIs and managing rate limits, costs, and reliability • Architect solutions with cost-efficiency as a first-class concern, implementing monitoring and optimization strategies for compute and storage • Make critical build-vs-buy decisions and establish architectural standards for the data organization • Mentor engineers and elevate the team's technical capabilities through code reviews, design discussions, and knowledge sharing

Job Requirements

  • 8+ years of software engineering experience with 5+ years focused on data platforms or data engineering
  • Deep expertise with Databricks, Apache Spark, and data lakehouse architectures
  • Strong experience building and operating data pipelines at scale (handling TBs+ of data)
  • Experience integrating AI/ML capabilities into data pipelines (clustering, LLM APIs, classification, summarization)
  • Proficiency in Python, DBT, and SQL for data processing and pipeline development
  • Experience with both batch and streaming large scale data processing patterns
  • Strong understanding of cloud platforms (AWS, Azure)
  • Excellent communication skills and ability to mentor engineers
  • Preferred Qualifications:**
  • Experience designing both batch and streaming/near real-time data architectures
  • Proficiency with Elasticsearch for building analytical systems with full-text search capabilities
  • Hands-on experience with LLM APIs and understanding of rate limiting and cost optimization
  • Experience with Agentic AI, context engineering, and evaluation
  • Background in trust & safety, security, or content moderation domains
  • Experience with data observability tools and building comprehensive monitoring systems
  • Prior experience at a startup or fast-paced environment
  • Apply agentic coding tools for day to day development
  • Familiarity with Databricks' Lakeflow, Agent Bricks, and vector databases

Benefits

  • Competitive compensation package, 401(k), and equity -** everyone has a stake in our growth! **
  • Comprehensive health benefits for you and your loved ones, including wellness days and monthly wellness reimbursements - **an apple a day doesn't always keep the doctor away! **
  • Generous vacation policy, encouraging you to take the time you need - we trust you to strike the right work/life balance!
  • A flexible work environment with opportunities to collaborate with your team in person -** you can have it all! **
  • Inclusion and Impact **- soar to new heights! **
  • Professional development stipend -** never stop learning! **

Related Categories

Related Job Pages

More Data Engineer Jobs

Senior Data Engineer

SmarterDx

Improving clinical and financial outcomes with physician-validated AI for documentation and coding.

Data Engineer38 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

Senior Data Engineer building scalable data pipelines for healthcare AI solutions

AirflowApacheAWSCloudInformaticaSparkSQL
United States
$200K - $220K / year

Data Engineer

Mento

Coaching that accelerates the growth of high performers

Data Engineer38 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

Data Engineer building data infrastructure for Mento's coaching platform

ETLPostgresPythonSQLGo
United States
Full TimeRemoteTeam 11-50H1B No Sponsor

Senior Data Engineer building actionable data systems for healthcare startup

PythonSQL
United States
$170K - $190K / year

Senior Data Engineer

CertifID

CertifID is the most secure way to send and receive wiring information.

Data Engineer39 days ago
Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor

Senior Data Engineer owning systems for trusted data insights at CertifID

CloudSQL
Texas