Data Architect AIG Job Gurugram area,Uttar Pradesh India,IT/Tech

Data Architect

About

The Role

We are seeking a Data Architect with deep Big Data Engineering expertise to design and modernize large-scale, cloud-native data platforms. This role emphasizes distributed data processing, real-time pipelines, data platform automation, and GenAI enablement on top of strong Big Data foundations.

Key Responsibilities

Architect and govern enterprise Big Data platforms (data lake, lakehouse, warehouse, real-time).
Design high-volume, high-velocity data pipelines using batch and streaming frameworks.
Lead implementation of distributed processing architectures (Spark, PySpark, EMR).
Build event-driven and real-time streaming solutions (Kafka, Kinesis, Flink).
Define ETL/ELT patterns, metadata-driven pipelines, and reusable ingestion frameworks.
Drive data platform automation (Airflow/Step Functions, CI/CD, data quality, observability).
Optimize performance, scalability, fault tolerance, and cost across Big Data workloads.
Integrate GenAI architectures (LLMs, embeddings, vector databases, RAG) with enterprise data lakes.
Ensure security, governance, lineage, and compliance across data platforms.
Provide hands-on leadership and technical mentoring to data engineering teams.

Required Technical Skills & Experience

12+ years in Big Data Engineering / Data Architecture roles.
Expert-level experience with Spark, PySpark, SQL, and distributed compute engines.
Strong knowledge of AWS Big Data stack: S3, EMR, Glue, Athena, Redshift, Lambda, Step Functions.
Hands-on experience with Snowflake (performance tuning, data sharing, optimization).
Expertise in streaming platforms:
Kafka, Kinesis, Flink, or Spark Streaming.
Strong experience with data modeling (dimensional, Data Vault 2.0).
Proficiency in Python, schema evolution, partitioning, and data versioning.

Experience with orchestration and automation tools (Airflow, Dagster, CI/CD).
Working knowledge of GenAI data integration (feature stores, vector DBs, RAG pipelines).

Experience with Agile delivery and leading globally distributed engineering teams.

Responsibilities for Internal Candidates

Key Responsibilities

Architect and govern enterprise Big Data platforms (data lake, lakehouse, warehouse, real-time).
Design high-volume, high-velocity data pipelines using batch and streaming frameworks.
Lead implementation of distributed processing architectures (Spark, PySpark, EMR).
Build event-driven and real-time streaming solutions (Kafka, Kinesis, Flink).
Define ETL/ELT patterns, metadata-driven pipelines, and reusable ingestion frameworks.
Drive data platform automation (Airflow/Step Functions, CI/CD, data quality, observability).
Optimize performance, scalability, fault tolerance, and cost across Big Data workloads.
Integrate GenAI architectures (LLMs, embeddings, vector databases, RAG) with enterprise data lakes.
Ensure security, governance, lineage, and compliance across data platforms.
Provide hands-on leadership and technical mentoring to data engineering teams.


Increase/decrease your Search Radius (miles)



Job Posting Language