More jobs:
Job Description & How to Apply Below
We are seeking a skilled and motivated Data Engineer with hands-on experience in Databricks and Airflow to join our Engineering team. The ideal candidate will be responsible for designing, building, and maintaining scalable data pipelines and infrastructure that enable efficient data collection, processing, and storage using modern cloud-based technologies. The Data Engineer will work closely with Engineering, Data Scientists, Analysts, and Business Stakeholders to enable data-driven decision-making and actionable insights.
Key Responsibilities:
Design, develop, and maintain efficient, reliable, and scalable data pipelines using Databricks , Apache Spark, Apache Airflow and related technologies to process large volumes of structured and unstructured data.
Build and manage ETL workflows or jobs to transform raw data into optimized, analytics-ready datasets.
Collaborate with Data Scientists, Analysts, and Business Teams to gather data requirements and translate them into robust technical solutions.
Develop and maintain data lakes and data warehouses using Databricks, cloud storage (S3, ADLS), and query engines (Delta Lake, Redshift, Big Query).
Implement automated data quality checks and monitoring mechanisms to ensure data accuracy and reliability.
Optimize large-scale distributed data processing jobs for performance, scalability, and cost efficiency.
Implement secure, governed, and auditable data access and sharing practices in line with company policies and regulations (GDPR, HIPAA).
Monitor and troubleshoot Databricks clusters, jobs, and performance issues.
Document data models, pipeline architecture, and processes for clarity, maintainability, and handover.
Stay updated on new features and best practices for Databricks and related data engineering technologies.
Required Qualifications:
Bachelor's or Master's degree in Computer Science, Information Systems, Engineering, or a related field.
Proven hands-on experience with Databricks platform and Apache Airflow for big data processing.
Strong proficiency in Apache Spark (PySpark, Scala, or Spark SQL).
Strong programming skills in Python or Java
Experience in developing ETL pipelines using Databricks notebooks, Jobs, and Delta Lake.
Strong SQL skills for data manipulation, transformation, and querying.
Experience working with cloud platforms such as AWS (S3, Redshift, Glue), Azure (ADLS, Synapse), or GCP (Big Query, Dataflow).
Knowledge of Data Lake and Data Warehouse architectures.
Experience implementing Delta Lake for data versioning and efficient incremental data processing.
Familiarity with version control tools (e.g., Git) and CI/CD for data workflows.
Understanding of data modeling, data partitioning, and schema design best practices.
Preferred Qualifications:
Experience with containerization (Docker) and orchestration (Kubernetes).
Knowledge of data governance frameworks and metadata management tools.
Experience with data security and compliance practices.
Experience setting up automated monitoring, alerts, and logging for Databricks jobs.
Experience in using APIs for data ingestion and extraction.
Strong analytical, communication, and problem-solving skills.
Position Requirements
10+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×