Deep Light AI is a specialist AI and data consultancy with extensive experience implementing intelligent enterprise systems across multiple industries, with particular depth in financial services and banking. Our team combines deep expertise in data science, statistical modeling, AI/ML technologies, workflow automation, and systems integration with a practical understanding of complex business operations.
We are seeking a Lakehouse Platform Engineer to serve as the lead architect and custodian of our enterprise data backbone. In this pivotal consultancy role, you will be responsible for the health, scalability, and evolution of our entire technology stack, ensuring that our Lakehouse architecture is not only reliable but optimized for high-performance AI and analytics. You will own the lifecycle of AWS Glue jobs, manage the intricacies of Apache Iceberg table registries, and operationalize industry-leading tools like Open Metadata for governance and Soda Core for data quality.
From designing robust Disaster Recovery (DR) scenarios to automating infrastructure via Terraform, your work will provide the foundation upon which our Data Factory squads build the future of intelligent enterprise systems.
As a consultant within our specialist firm, your technical prowess is matched by your ability to drive adoption. You will act as a bridge between platform engineering and business-critical operations, "selling" the value of self-service tooling and automated lineage to senior stakeholders. We are looking for a master problem-solver with 8+ years of experience—ideally within financial services or banking—who thrives in fast-paced, Agile environments.
If you are passionate about decommissioning legacy technical debt while building state-of-the-art, automated data platforms that meet aggressive migration targets, you will find your home at Deep Light AI.
- Platform Management & Optimization
- Manage and maintain Lakehouse components:
Storage: S3 bucket configurations, lifecycle policies, storage optimization. - Compute: AWS Glue job management, optimization, DPU allocation.
- Catalog: AWS Glue Data Catalog and Iceberg table registry.
- Semantic Layer:
Deploy and integrate with our semantic layer. - Governance:
Deploy, configure, and upgrade Open Metadata. - Quality:
Maintain Soda Core infrastructure and integration. - Apply Iceberg best practices for table optimization and maintenance.
- Implement automated table maintenance processes.
- Manage and maintain Lakehouse components:
- Self‑Service & Automation
- Deploy self‑service tooling for Data Factory squads.
- Implement automated lineage capture from Glue jobs.
- Configure audit logging (Cloud Trail, S3 access logs).
- Disaster Recovery & Reliability
- Design and implement Disaster Recovery (DR) scenarios including failover, backup management, and runbooks.
- Execute annual DR tests for the entire data landscape.
- Ensure all critical platform components are monitored, with alerting and active follow‑up.
- Migration & Decommissioning
- Execute decommissioning roadmap.
- Collaboration
- Work closely with platform engineers and architects to ensure alignment on optimization and tooling.
- Partner with operational teams for monitoring and alerting.
As an AI consultancy, our greatest asset is the expertise of our people. Technical mastery is the foundation of what we do, and the ability to bridge the gap between complex data science and actionable business value defines your success with Deep Light. You will be the face of our firm, tasked with building trust, articulating the "why" behind your technical decisions, and effectively "selling" your vision to high‑level stakeholders.
Requirements- Data platform engineering or related roles, ideally a minimum of 8 years experience.
- Managing Lakehouse platforms on AWS.
- AWS services: S3, Glue, Cloud Trail, Athena.
- Apache Iceberg, Open Metadata, and Soda Core.
- Disaster recovery planning and execution.
- Infrastructure automation using Terraform and Git.
- Kafka (MSK) and Open Search.
- Identifying ways to automate work and repetitive tasks.
- Problem‑solving and troubleshooting skills.
- Working cross‑functionally and managing complex platform operations.
- Working in a fast‑paced environment and delivering aggressive migration…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).