More jobs:
Senior Data Engineer - Spark, Airflow
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2025-12-05
Listing for:
Sigmaways Inc
Full Time
position Listed on 2025-12-05
Job specializations:
-
IT/Tech
Data Engineer, Big Data
Job Description & How to Apply Below
Get AI-powered advice on this job and more exclusive features.
We are seeking an experienced Data Engineer to design and optimize scalable data pipelines that drive our global data and analytics initiatives.
In this role, you will leverage technologies such as Apache Spark
, Airflow
, and Python to build high performance data processing systems and ensure data quality, reliability, and lineage across Mastercard’s data ecosystem.
The ideal candidate combines strong technical expertise with hands‑on experience in distributed data systems, workflow automation, and performance tuning to deliver impactful, data‑driven solutions at enterprise scale.
Responsibilities:
- Design and optimize Spark-based ETL pipelines for large‑scale data processing.
- Build and manage Airflow DAGs for scheduling, orchestration, and checkpointing.
- Implement partitioning and shuffling strategies to improve Spark performance.
- Ensure data lineage, quality, and traceability across systems.
- Develop Python scripts for data transformation, aggregation, and validation.
- Execute and tune Spark jobs using spark-submit.
- Perform Data Frame joins and aggregations for analytical insights.
- Automate multi‑step processes through shell scripting and variable management.
- Collaborate with data, Dev Ops, and analytics teams to deliver scalable data solutions.
Qualifications:
- Bachelor’s degree in Computer Science, Data Engineering, or related field (or equivalent experience).
- At least 7 years of experience in data engineering or big data development.
- Strong expertise in Apache Spark architecture, optimization, and job configuration.
- Proven experience with Airflow DAGs using authoring, scheduling, checkpointing, monitoring.
- Skilled in data shuffling, partitioning strategies, and performance tuning in distributed systems.
- Expertise in Python programming including data structures and algorithmic problem‑solving.
- Hands‑on with Spark Data Frames and PySpark transformations using joins, aggregations, filters.
- Proficient in shell scripting, including managing and passing variables between scripts.
- Experienced with spark submit for deployment and tuning.
- Solid understanding of ETL design, workflow automation, and distributed data systems.
- Excellent debugging and problem‑solving skills in large‑scale environments.
- Experience with AWS Glue, EMR, Databricks, or similar Spark platforms.
- Knowledge of data lineage and data quality frameworks like Apache Atlas.
- Familiarity with CI/CD pipelines, Docker/Kubernetes, and data governance tools.
- Mid‑Senior level
- Contract
- Information Technology
- Banking
Referrals increase your chances of interviewing at Sigmaways Inc by 2x
#J-18808-LjbffrPosition Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×