Director, MLOps Engineering
Listed on 2026-03-01
-
IT/Tech
Data Engineer, AI Engineer, Cloud Computing, Machine Learning/ ML Engineer
Senior MLOps Engineer leads the design and maintenance of scalable, secure infrastructure for ML model deployment and lifecycle management. The Senior ML Ops Engineer role ensures models transition from development to production while meeting regulatory and compliance standards and guidelines. The ML Ops engineer collaborates closely with Data Science, Engineering, Master Data Management and other enterprise operations and business vertical teams to accelerate ML-driven insights, enhance model accuracy, govern and monitor the ML ecosystem.
Beyond technical execution, the role defines MLOps strategy and architecture, addressing the "last mile" challenge of AI value realization by automating and scaling ML models as tangible business assets. Reporting the Head of Enterprise Data Management, this role serves as the key pillar that enhances efficiency, boosts model accuracy, accelerates time-to-market for new solutions, and ensures the scalability and robust governance of machine learning initiatives.
- ML Model Deployment & Management
:
Lead the design, implementation, and ongoing maintenance of scalable ML infrastructure. The infrastructure will primarily reside on leading cloud services to facilitate the seamless deployment and efficient scaling of ML models. The engineer will oversee the development of the MLOps platform and automated pipelines specifically designed for deploying, monitoring, and maintaining models within production environments. A critical aspect of this responsibility includes implementing robust solutions for model versioning, systematic retraining, and comprehensive artifact management, treating each model as a distinct artifact that requires meticulous building, testing, deployment, and ongoing management throughout its lifecycle. - Automation & CI/CD Pipelines
:
Design and implement extensive automation across the ML workflow, covering model training, rigorous testing, thorough validation, and efficient deployment. This includes setting up robust Continuous Integration/Continuous Delivery (CI/CD) pipelines for both model training and deployment, leveraging industry-standard. Automate complex data and model workflows utilizing powerful orchestration tools. - Monitoring, Performance & Reliability
:
Implement comprehensive monitoring and alerting systems. These systems are crucial for real-time tracking of model performance, assessing data quality, and ensuring overall system health. Utilize specialized tools to proactively detect critical issues like model drift, data quality anomalies, and performance degradation. A significant part of the daily work involves troubleshooting issues within production environments, including debugging model deployment failures or addressing instances of inaccurate predictions caused by mismatches in input data. - Data & Feature Engineering Support
:
Build and maintain sophisticated feature stores. Ensure precise alignment between training and inference data pipelines, thereby preventing data leakage and ensuring consistency. Collaborate with data engineers to build robust Extract, Transform, Load (ETL) pipelines that feed into data lake houses. The engineer will also ensure dataset reliability through robust versioning practices and seamless data integration processes. - Security & Compliance
:
Integrate robust security measures directly into MLOps pipelines. Collaborate with other operations and enterprise functions to set up and monitor processes to actively mitigate a wide array of risks, including exploitation attacks, access abuse, pipeline infrastructure vulnerabilities, data integrity compromises, and model integrity attacks. - Collaboration & Mentorship
:
Engage in extensive collaboration with data scientists, data engineers, and Dev Ops teams to ensure the seamless integration of machine learning solutions into the firm's products and operations. Provide technical support and guidance to other team members, such as refactoring Python code from data scientists to enhance programming skills and ensure production readiness. Drive strategic initiatives, troubleshoot complex cross-domain issues, and ensure that "Trustworthy AI" guardrails…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).