BLACKBIRD.AI
Deception Detection for the Information Age.
Staff Data Engineer
Location
New York + 2 moreAll locations: New York, Texas, Washington
Posted
38 days ago
Salary
$160K - $190K / year
8 yrs expEnglishApacheAWSAzureCloudElastic SearchPythonSparkSQL
Job Description
• Design and implement scalable data platform architecture on Databricks, supporting both batch and streaming ingestion
• Build robust, fault-tolerant data ingestion pipelines that integrate with multiple third-party APIs and data providers
• Design and implement AI-powered enrichment stages within pipelines—applying ML clustering, generative AI summarization, classification, and entity extraction to transform raw data into actionable intelligence
• Build analytical systems with full-text search capabilities using Elasticsearch for rapid querying and analysis of enriched data
• Work with AI/ML researchers to implement, integrate and scaling AI processing
• Expose data platform capabilities as APIs and other interfaces for downstream consumption by applications and services
• Optimize data lake and lakehouse architecture for performance, cost-efficiency, and scalability
• Design and implement data quality frameworks, monitoring, and alerting systems
• Design efficient architectures for calling external AI APIs and managing rate limits, costs, and reliability
• Architect solutions with cost-efficiency as a first-class concern, implementing monitoring and optimization strategies for compute and storage
• Make critical build-vs-buy decisions and establish architectural standards for the data organization
• Mentor engineers and elevate the team's technical capabilities through code reviews, design discussions, and knowledge sharing
Job Requirements
- 8+ years of software engineering experience with 5+ years focused on data platforms or data engineering
- Deep expertise with Databricks, Apache Spark, and data lakehouse architectures
- Strong experience building and operating data pipelines at scale (handling TBs+ of data)
- Experience integrating AI/ML capabilities into data pipelines (clustering, LLM APIs, classification, summarization)
- Proficiency in Python, DBT, and SQL for data processing and pipeline development
- Experience with both batch and streaming large scale data processing patterns
- Strong understanding of cloud platforms (AWS, Azure)
- Excellent communication skills and ability to mentor engineers
- Preferred Qualifications:**
- Experience designing both batch and streaming/near real-time data architectures
- Proficiency with Elasticsearch for building analytical systems with full-text search capabilities
- Hands-on experience with LLM APIs and understanding of rate limiting and cost optimization
- Experience with Agentic AI, context engineering, and evaluation
- Background in trust & safety, security, or content moderation domains
- Experience with data observability tools and building comprehensive monitoring systems
- Prior experience at a startup or fast-paced environment
- Apply agentic coding tools for day to day development
- Familiarity with Databricks' Lakeflow, Agent Bricks, and vector databases
Benefits
- Competitive compensation package, 401(k), and equity -** everyone has a stake in our growth! **
- Comprehensive health benefits for you and your loved ones, including wellness days and monthly wellness reimbursements - **an apple a day doesn't always keep the doctor away! **
- Generous vacation policy, encouraging you to take the time you need - we trust you to strike the right work/life balance!
- A flexible work environment with opportunities to collaborate with your team in person -** you can have it all! **
- Inclusion and Impact **- soar to new heights! **
- Professional development stipend -** never stop learning! **
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Senior Data Engineer
SmarterDxImproving clinical and financial outcomes with physician-validated AI for documentation and coding.
Data Engineer38 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor
Senior Data Engineer building scalable data pipelines for healthcare AI solutions
AirflowApacheAWSCloudInformaticaSparkSQL
Data Engineer38 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor
Data Engineer building data infrastructure for Mento's coaching platform
ETLPostgresPythonSQLGo
United States
Data Engineer39 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor
Senior Data Engineer building actionable data systems for healthcare startup
PythonSQL
Data Engineer39 days ago
Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor
Senior Data Engineer owning systems for trusted data insights at CertifID
CloudSQL
Texas