Data Engineer
Listed on 2026-01-13
-
Software Development
Data Engineer, AI Engineer, Machine Learning/ ML Engineer
Location: New York
Blackbird.
AI helps organizations discover emergent threats and stay one step ahead of real-world harm through our AI-powered Narrative and Risk Intelligence Platform. Our commitment is to prioritize safety and security, providing the tools to identify potential risks and ensure a safer environment proactively. No matter the job or where it's located, we're all connected by a shared vision:
To lead and enhance the landscape of risk intelligence.
As a Staff Data Engineer, you will play a critical role in architecting and scaling our data platform and AI/ML processing infrastructure. You'll be a technical leader responsible for our entire data ecosystem—from ingestion pipelines that process diverse data sources to the lakehouse architecture that powers our narrative analysis capabilities. You'll architect systems that seamlessly support batch and streaming data patterns while building real time alerting on generated insights.
You’ll work at the intersection of data engineering, AI-powered data transformation, and platform engineering, making architectural decisions that will shape our ability to detect misinformation, disinformation, and narrative attacks at scale while managing costs effectively. A key aspect of this role involves building intelligent pipelines that use traditional AI and generative AI to cluster, enrich, classify, and extract insights from data as it flows through our system.
Asa Staff Data Engineer you will:
- Design and implement scalable data platform architecture on Databricks, supporting both batch and streaming ingestion
- Build robust, fault‑tolerant data ingestion pipelines that integrate with multiple third‑party APIs and data providers
- Design and implement AI‑powered enrichment stages within pipelines—applying ML clustering, generative AI summarization, classification, and entity extraction to transform raw data into actionable intelligence
- Build analytical systems with full‑text search capabilities using Elasticsearch for rapid querying and analysis of enriched data
- Work with AI/ML researchers to implement, integrate and scale AI processing
- Expose data platform capabilities as APIs and other interfaces for downstream consumption by applications and services
- Optimize data lake and lakehouse architecture for performance, cost‑efficiency, and scalability
- Design and implement data quality frameworks, monitoring, and alerting systems
- Design efficient architectures for calling external AI APIs and managing rate limits, costs, and reliability
- Architect solutions with cost‑efficiency as a first‑class concern, implementing monitoring and optimization strategies for compute and storage
- Make critical build‑vs‑buy decisions and establish architectural standards for the data organization
- Mentor engineers and elevate the team's technical capabilities through code reviews, design discussions, and knowledge sharing
- 8+ years of software engineering experience with 5+ years focused on data platforms or data engineering
- Deep expertise with Databricks, Apache Spark, and data lakehouse architectures
- Strong experience building and operating data pipelines at scale (handling TBs+ of data)
- Experience integrating AI/ML capabilities into data pipelines (clustering, LLM APIs, classification, summarization)
- Proficiency in Python, DBT, and SQL for data processing and pipeline development
- Experience with both batch and streaming large‑scale data processing patterns
- Strong understanding of cloud platforms (AWS, Azure)
- Excellent communication skills and ability to mentor engineers
- Experience designing both batch and streaming/near real‑time data architectures
- Proficiency with Elasticsearch for building analytical systems with full‑text search capabilities
- Hands‑on experience with LLM APIs and understanding of rate limiting and cost optimization
- Experience with Agentic AI, context engineering, and evaluation
- Background in trust & safety, security, or content moderation domains
- Experience with data observability tools and building comprehensive monitoring systems
- Prior experience at a startup or fast‑paced environment
- Apply agentic coding tools for day‑to‑day development
- Familiarity with…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).