More jobs:
Job Description & How to Apply Below
About
The Role
We are seeking a Data Architect with deep Big Data Engineering expertise to design and modernize large-scale, cloud-native data platforms. This role emphasizes distributed data processing, real-time pipelines, data platform automation, and GenAI enablement on top of strong Big Data foundations.
Key Responsibilities
Architect and govern enterprise Big Data platforms (data lake, lakehouse, warehouse, real-time).
Design high-volume, high-velocity data pipelines using batch and streaming frameworks.
Lead implementation of distributed processing architectures (Spark, PySpark, EMR).
Build event-driven and real-time streaming solutions (Kafka, Kinesis, Flink).
Define ETL/ELT patterns, metadata-driven pipelines, and reusable ingestion frameworks.
Drive data platform automation (Airflow/Step Functions, CI/CD, data quality, observability).
Optimize performance, scalability, fault tolerance, and cost across Big Data workloads.
Integrate GenAI architectures (LLMs, embeddings, vector databases, RAG) with enterprise data lakes.
Ensure security, governance, lineage, and compliance across data platforms.
Provide hands-on leadership and technical mentoring to data engineering teams.
Required Technical Skills & Experience
12+ years in Big Data Engineering / Data Architecture roles.
Expert-level experience with Spark, PySpark, SQL, and distributed compute engines.
Strong knowledge of AWS Big Data stack: S3, EMR, Glue, Athena, Redshift, Lambda, Step Functions.
Hands-on experience with Snowflake (performance tuning, data sharing, optimization).
Expertise in streaming platforms:
Kafka, Kinesis, Flink, or Spark Streaming.
Strong experience with data modeling (dimensional, Data Vault 2.0).
Proficiency in Python, schema evolution, partitioning, and data versioning.
Experience with orchestration and automation tools (Airflow, Dagster, CI/CD).
Working knowledge of GenAI data integration (feature stores, vector DBs, RAG pipelines).
Experience with Agile delivery and leading globally distributed engineering teams.
Responsibilities for Internal Candidates
Key Responsibilities
Architect and govern enterprise Big Data platforms (data lake, lakehouse, warehouse, real-time).
Design high-volume, high-velocity data pipelines using batch and streaming frameworks.
Lead implementation of distributed processing architectures (Spark, PySpark, EMR).
Build event-driven and real-time streaming solutions (Kafka, Kinesis, Flink).
Define ETL/ELT patterns, metadata-driven pipelines, and reusable ingestion frameworks.
Drive data platform automation (Airflow/Step Functions, CI/CD, data quality, observability).
Optimize performance, scalability, fault tolerance, and cost across Big Data workloads.
Integrate GenAI architectures (LLMs, embeddings, vector databases, RAG) with enterprise data lakes.
Ensure security, governance, lineage, and compliance across data platforms.
Provide hands-on leadership and technical mentoring to data engineering teams.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×