Databricks Engineer
Listed on 2026-01-19
-
IT/Tech
Data Engineer, Data Science Manager
Job Title:
Databricks Engineer
Location:
Washington, District of Columbia
Type:
Contract
Contractor Work Model:
Onsite
The Enterprise Data Platform (EDP) empowers the Board to confidently use trusted, standardized, and well‑governed data to drive insight and innovation.
BackgroundThe Data Engineer designs, builds, and operates batch and streaming data pipelines and curated data products on the Enterprise Data Platform (EDP) using Databricks and Apache Spark. The role is hands‑on in Python and R, enabling scalable engineering workflows while supporting analytics and research use cases. The engineer partners with product, architecture, governance, and mission teams to deliver secure, performant, observable pipelines and trusted datasets.
RequirementsThe candidate shall possess the knowledge and skills set forth in the Technical Services BOA, Section 3.6.4.2 for labor category Information Data Engineer.
The candidate shall also demonstrate the below knowledge and experience:
- Strong proficiency in Python and R for data engineering and analytical workflows.
- Hands‑on experience with Databricks and Apache Spark, including Structured Streaming (watermarking, stateful processing concepts, checkpointing, exactly‑once/at‑least‑once tradeoffs).
- Strong SQL skills for transformation and validation.
- Experience building production‑grade pipelines: idempotency, incremental loads, backfills, schema evolution, and error handling.
- Experience implementing data quality checks and validation for both batch and event streams (late arrivals, deduplication, event‑time vs processing‑time).
- Observability skills: logging/metrics/alerting, troubleshooting, and performance tuning (partitions, joins/shuffles, caching, file sizing).
- Proficiency with Git and CI/CD concepts for data pipelines, Databricks asset bundling, Databricks application deployments, and proficiency using Databricks CLI.
- Experience with lakehouse table formats and patterns (e.g., Delta tables) including compaction/optimization and lifecycle management.
- Familiarity with orchestration patterns (Databricks Workflows/Jobs) and dependency management.
- Experience with governance controls (catalog permissions, secure data access patterns, metadata/lineage expectations).
- Knowledge of message/event platforms and streaming ingestion patterns (e.g., Kafka, Kinesis equivalents) and sink patterns for serving layers.
- Experience collaborating with research/analytics stakeholders and translating analytical needs into engineered data products.
- Strong problem‑solving and debugging across ingestion → transformation → serving.
- Clear technical communication and documentation discipline.
- Ability to work across product/architecture/governance teams in a regulated environment.
- Deep Delta Lake expertise including time travel, Change Data Feed (CDF), MERGE operations, CLONE, table constraints, and optimization techniques; understanding of liquid clustering and table maintenance best practices.
- Experience with Lakeflow/Delta Live Tables (DLT) including expectations framework, materialized vs. streaming table patterns, and declarative pipeline design.
- Proficiency with testing frameworks (pytest, Great Expectations, deequ) and test‑driven development practices for production data pipelines.
- Data modeling skills including dimensional modeling (star/snowflake schemas), medallion architecture implementation, and slowly changing dimension (SCD) pattern implementation.
- AWS data services experience including S3 optimization, IAM role configuration for data access, and Cloud Watch integration; understanding of cost optimization patterns.
- Bachelor’s degree in a related field or equivalent experience.
- 10+ years of data engineering experience, including production Spark‑based batch pipelines and streaming implementations.
- Desirable
Certifications:
- Databricks Certified Apache Spark Developer Associate
- Databricks Certified Data Engineer Associate or Professional
- AWS Certified Developer Associate
- AWS Certified Data Engineer Associate
- AWS Certified Solution Architect Associate
- Build and maintain end‑to‑end pipelines in Databricks…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).