ML Ops Engineer
Listed on 2026-01-12
-
IT/Tech
AI Engineer, Cloud Computing
Our Company
Changing the world through digital experiences is what Adobe's all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences! We're passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen.
We're on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours!
The OpportunityJoin Adobe as a skilled and proactive Machine Learning Ops Engineer to drive the operational reliability, scalability, and performance of our AI systems! This role is foundational in ensuring our AI systems operate seamlessly across environments while meeting the needs of both developers and end users. You will lead efforts to automate and optimize the full machine learning lifecycle—from data pipelines and model deployment to monitoring, governance, and incident response.
Whatyou'll Do Model Lifecycle Management
- Manage model versioning, deployment strategies, rollback mechanisms, and A/B testing frameworks for LLM agents and RAG systems.
- Coordinate model registries, artifacts, and promotion workflows in collaboration with ML Engineers.
- Implement real-time monitoring of model performance (accuracy, latency, drift, degradation).
- Track conversation quality metrics and user feedback loops for production agents.
- Develop automated pipelines for timely/agent testing, validation, and deployment.
- Integrate unit/integration tests into model and workflow updates for safe rollouts.
- Provision and manage scalable infrastructure (Kubernetes, Terraform, serverless stacks).
- Enable auto-scaling, resource optimization, and load balancing for AI workloads.
- Craft and maintain data ingestion pipelines for both structured and unstructured sources.
- Ensure reliable feature extraction, transformation, and data validation workflows.
- Monitor and optimize AI stack performance (model latency, API efficiency, GPU/compute utilization).
- Drive cost-aware engineering across inference, retrieval, and orchestration layers.
- Build alerting and triage systems to identify and resolve production issues.
- Maintain SLAs and develop rollback/recovery strategies for AI services.
- Enforce model governance, audit trails, and explainability standards.
- Support documentation and regulatory frameworks (e.g., GDPR, SOC 2, internal policy alignment).
- 3-5+ years in MLOps, Dev Ops, or ML platform engineering.
- Strong experience with cloud infrastructure (AWS/GCP/Azure), container orchestration (Kubernetes), and IaC tools (Terraform, Helm).
- Familiarity with ML model serving tools (e.g., MLflow, Seldon, Torch Serve, Bento
ML). - Proficiency in Python and CI/CD automation (e.g., Git Hub Actions, Jenkins, Argo Workflows).
- Experience with monitoring tools (Prometheus, Grafana, Datadog, ELK, Arize AI, etc.).
- Experience supporting LLM applications, RAG pipelines, or AI agent orchestration.
- Understanding of vector databases, embedding workflows, and model retraining triggers.
- Exposure to privacy, safety, and responsible AI principles in operational contexts.
- Bachelor's or equivalent experience in Computer Science, Engineering, or a related technical field.
Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. The U.S. pay range for this position is $142,700 – $257,600 annually. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience. Your recruiter can share more about the specific salary range for the job location during the hiring process.
At Adobe, for sales roles starting salaries are expressed as total target compensation (TTC = base + commission),…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).