Lead Data Engineer Job Ahmedabad area,Uttar Pradesh India,IT/Tech

Job Description:

Job Title:

Data Engineer with ML Exposure
Role Summary
Tech Lead with core experience in Data Engineering and have good understanding of Machine Learning. Build and operate scalable, reliable data pipelines on Azure. Develop batch and streaming ingestion, transform data using industry standard data analytics and AI platform (PySpark/SQL), ADF/Snowflake or equivalent, enforce data quality, and publish curated datasets for analytics and ML.

Good understanding of supervised, unsupervised learning, model evaluation, and feature engineering

Key Responsibilities
Design, build, and maintain ETL/ELT pipelines in Azure Data Factory and equivalent tool across Bronze → Silver → Gold layers/Medallion Architecture.
Implement Delta Lake best practices (ACID, schema evolution, MERGE/upsert, time travel, Z-ORDER).
Write performant PySpark and SQL tune jobs (partitioning, caching, join strategies).
Create reusable components manage code in Git contribute to CI/CD pipelines (Azure Dev Ops/Git Hub Actions/Jenkins).
Apply data quality checks (Great Expectations or custom validations), monitoring, drift detection, and alerting.
Model data for analytics (star/dimensional) publish to Synapse/Snowflake/SQL Server.
Uphold governance and security (Purview/Unity Catalog lineage, RBAC, tagging, encryption, PII handling).
Author documentation/runbooks support production incidents and root-cause analysis suggest cost/performance improvements.

Must-Have (Mandatory)
Data Engineering & Pipelines   Hands-on experience building production pipelines with Azure Data Factory or equivalent and industry standard data analytics platform for building, deploying, storing, sharing and maintaining enterprise grade data (PySpark/SQL).
Working knowledge of Medallion Architecture and Delta Lake (schema evolution, ACID).
Power BI exposure for publishing curated tables and building operational KPIs.

Programming & Automation   Strong Python (pandas/PySpark) and SQL.
Practical Git workflow experience integrating pipelines into CI/CD (Azure Dev Ops/Git Hub Actions/Jenkins).
Familiarity with packaging reusable code (e.g., Python wheels) and configuration-driven jobs.

Data Modeling & Warehousing   Solid grasp of dimensional modeling/star schemas experience with Synapse, Snowflake, or SQL Server.

Data Quality & Monitoring   Implemented validation checks and alerts exposure to drift detection and pipeline observability.

Cloud Platforms (Azure preferred)   ADLS Gen2, Key Vault, ADF basics (linked services, datasets, triggers), environment promotion.

Data Governance & Security

Experience with metadata/lineage (Purview/Unity Catalog), RBAC, secrets management, and secure data sharing.
Understanding of PII/PHI handling and encryption at rest/in transit.

Collaboration   Clear communication, documentation discipline, Agile ways of working, and code reviews.

Machine Learning:
Deep understanding of supervised, unsupervised, and reinforcement learning, model evaluation, and feature engineering.
Programming:
Expert in Python (Num Py, Pandas, scikit-learn, etc.) R exposure acceptable.
Drift Detection & Monitoring:
Hands-on experience with model drift detection, monitoring, and automated alerts.
Good understanding of MLOps pipelines using Azure ML, MLflow, and CI/CD/CT
Databricks Asset Bundles (DAB) for environment promotion/infra-as-code style deployments.
Streaming/real-time:
Kafka/Event Hubs CDC tools (e.g., Debezium, ADF/Synapse CDC).
MLOps touchpoints: MLflow tracking/registry, feature tables, basic model-inference pipelines.
Data Ops practices: automated testing, data contracts, lineage-aware deployments, cost optimization on Azure.

Certifications:

Microsoft Certified - Azure Data Engineer Associate (DP-203) or equivalent.

10+ years of professional experience in data engineering (or equivalent project depth).
Bachelor's/Master's in CS/IT/Engineering or related field (or equivalent practical experience).


Increase/decrease your Search Radius (miles)



Job Posting Language