Deep Light AI is a specialist AI and data consultancy with extensive experience implementing intelligent enterprise systems across multiple industries, with particular depth in financial services and banking. Our team combines deep expertise in data science, statistical modeling, AI/ML technologies, workflow automation, and systems integration with a practical understanding of complex business operations.
The ML Ops and Dev Ops Lead is a specialised technical role focused on bridging the gap between AI development and production-grade operations. You will design, deploy, and manage scalable AI solutions primarily across AWS and Azure environments, integrating Machine Learning Operations (MLOps) and Large Language Model Operations (LLMOps). By leveraging containerization, automated CI/CD pipelines, and robust data engineering, you will ensure that the Bank’s AI-driven applications—including Generative AI and LLM agents—are secure, efficient, compliant, and highly available.
Responsibilities- Design and deploy scalable AI infrastructure using Git Hub Actions, EKS, ECS, and AWS Lambda; automate all resource provisioning via Terraform.
- Create and manage end-to-end CI/CD pipelines for model deployment, versioning, and lifecycle management using Azure Dev Ops and Git Hub Actions.
- Deploy machine learning and large language models via Amazon Sage Maker, EKS, and Azure ML Endpoints.
- Implement comprehensive monitoring for model performance and drift using Amazon Cloud Watch, Sage Maker Model Monitor, and Open Search.
- Integrate advanced services including Azure OpenAI, Azure Cognitive Services, and AWS Rekognition into enterprise workflows.
- Design robust data pipelines and real‑time ingestion streams using AWS Glue, Amazon Redshift, Athena, and Apache Spark.
- Enforce cloud security through AWS WAF, Azure Key Vault, and RBAC, ensuring all deployments meet GDPR and ISO compliance standards.
- Supervise team activities and contribute to change initiatives in line with the Bank’s continuous improvement standards.
- Supporting, Coaching and Mentoring more junior members of the team, when required.
- Additional responsibilities on occasion, on management request.
As an AI consultancy, our greatest asset is the expertise of our people.
While technical mastery is the foundation of what we do, the ability to bridge the gap between complex data science and actionable business value is what defines your success with Deeplight.
We're looking for individuals who are not only world‑class in their fields of specialism, but also compelling communicators and persuasive advocates for their own skills.
You will be the face of our firm, tasked with building trust, articulating the "why" behind your technical decisions, and effectively "selling" your vision to high‑level stakeholders.
If you thrive on the challenge of presenting cutting‑edge solutions as much as you do on building them, you will fit right in.
Qualifications- A Bachelor’s degree in Computer Science, AI, Software Engineering, Statistics, Mathematics, or a related quantitative field.
- A minimum of 5 years in designing and deploying AI solutions in cloud environments, specifically within MLOps/LLMOps frameworks.
- Proficiency with Git Hub Actions, Spark, Redshift, Mongo
DB, and Azure Purview. - Deep hands‑on expertise in AWS (Sage Maker, EKS, S3) and Azure (OpenAI, Cognitive Services) ecosystems.
- Mastery of Terraform for multi‑cloud resource automation.
- Proven experience in both MLOps (standard ML) and LLMOps (Large Language Models), focusing on fine‑tuning, orchestration, and monitoring.
- Expertise in managing container images via ECR/ACR and deploying via Kubernetes (EKS/AKS).
- Strong understanding of secure networking (WAF, NSGs) and secret management.
- Azure and/or AWS certifications (e.g., AWS Certified Machine Learning - Specialty or Azure AI Engineer Associate) are highly preferred.
- Ability to seamlessly manage workloads that span across both AWS and Azure.
- Skill in communicating complex technical infrastructure requirements to non‑technical business stakeholders.
- Adaptability in troubleshooting model drift and performance bottlenecks in real‑time agentic workflows.
- Proactiv…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).