Data Engineer Job Gurgaon area,Uttar Pradesh India,IT/Tech

Role Overview :

We are seeking a highly skilled and experienced Senior Data Engineer to join our data engineering team. In this role, you will be responsible for designing, implementing, and optimizing real-time data pipelines that process terabytes of data.

The ideal candidate will have 3+ years of hands-on experience in data engineering and strong expertise in modern data platforms such as Databricks, PySpark, Delta Lake, Amazon S3, and Kafka.

This role offers an opportunity to work on cutting-edge technologies in a fast-paced environment with a strong focus on performance optimization, scalability and reliability.

Key Responsibilities:

• Design, build, and maintain robust, scalable, and efficient real-time data pipelines using Databricks, PySpark, Kafka, and Delta Lake.

• Architect and implement data ingestion pipelines for high-volume streaming and batch data into Amazon S3 and Delta Lake.

• Optimize data pipelines and workflows for performance, scalability, and cost-efficiency.

• Process and analyze terabytes of structured and unstructured data to enable near real-time decision-making.

• Collaborate closely with stakeholders to define data requirements and ensure data integrity, security, and availability.

• Implement advanced data transformations, deduplication, and enrichment logic.

• Continuously improve data engineering best practices, automation, and reliability.

• Monitor, troubleshoot, and resolve issues in data pipelines to ensure high availability.

Experience:

• 3+ years of hands-on experience in data engineering, building and operating large-scale data pipelines in production environments.

Must Have

Skills:

• Proven experience with Databricks and PySpark for large-scale data processing.

• Strong expertise in Delta Lake for real-time and batch data workloads.

• In-depth knowledge of Apache Kafka for real-time data streaming.

• Hands-on experience with AWS S3 or equivalent cloud storage solutions.

• Solid understanding of distributed computing concepts and performance tuning.

• Experience processing and managing terabytes of data in production systems.

• Strong background in ETL/ELT design, data modeling, and pipeline optimization.

• Proficiency in writing clean, efficient, and maintainable Python (PySpark) code.

Nice-to-Have

Skills:

• Familiarity with Dev Ops practices, CI/CD pipelines, and Kubernetes.

• Knowledge of data security, governance, and compliance best practices.

• Experience with monitoring and alerting tools such as Prometheus, Grafana, or AWS Cloud Watch.


Increase/decrease your Search Radius (miles)



Job Posting Language