ML Ops Engineer Job San Jose area,California USA,IT/Tech

Our Company

Changing the world through digital experiences is what Adobe's all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences! We're passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen.

We're on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours!

The Opportunity

Join Adobe as a skilled and proactive Machine Learning Ops Engineer to drive the operational reliability, scalability, and performance of our AI systems! This role is foundational in ensuring our AI systems operate seamlessly across environments while meeting the needs of both developers and end users. You will lead efforts to automate and optimize the full machine learning lifecycle—from data pipelines and model deployment to monitoring, governance, and incident response.

What

you'll Do Model Lifecycle Management

Manage model versioning, deployment strategies, rollback mechanisms, and A/B testing frameworks for LLM agents and RAG systems.
Coordinate model registries, artifacts, and promotion workflows in collaboration with ML Engineers.

Monitoring & Observability

Implement real-time monitoring of model performance (accuracy, latency, drift, degradation).
Track conversation quality metrics and user feedback loops for production agents.

CI/CD for AI

Develop automated pipelines for timely/agent testing, validation, and deployment.
Integrate unit/integration tests into model and workflow updates for safe rollouts.

Infrastructure Automation

Provision and manage scalable infrastructure (Kubernetes, Terraform, serverless stacks).
Enable auto-scaling, resource optimization, and load balancing for AI workloads.

Data Pipeline Management

Craft and maintain data ingestion pipelines for both structured and unstructured sources.
Ensure reliable feature extraction, transformation, and data validation workflows.

Performance Optimization

Monitor and optimize AI stack performance (model latency, API efficiency, GPU/compute utilization).
Drive cost-aware engineering across inference, retrieval, and orchestration layers.

Incident Response & Reliability

Build alerting and triage systems to identify and resolve production issues.
Maintain SLAs and develop rollback/recovery strategies for AI services.

Compliance & Governance

Enforce model governance, audit trails, and explainability standards.
Support documentation and regulatory frameworks (e.g., GDPR, SOC 2, internal policy alignment).

What you need to succeed

3-5+ years in MLOps, Dev Ops, or ML platform engineering.
Strong experience with cloud infrastructure (AWS/GCP/Azure), container orchestration (Kubernetes), and IaC tools (Terraform, Helm).
Familiarity with ML model serving tools (e.g., MLflow, Seldon, Torch Serve, Bento

ML).
Proficiency in Python and CI/CD automation (e.g., Git Hub Actions, Jenkins, Argo Workflows).
Experience with monitoring tools (Prometheus, Grafana, Datadog, ELK, Arize AI, etc.).

Preferred Qualifications

Experience supporting LLM applications, RAG pipelines, or AI agent orchestration.
Understanding of vector databases, embedding workflows, and model retraining triggers.
Exposure to privacy, safety, and responsible AI principles in operational contexts.
Bachelor's or equivalent experience in Computer Science, Engineering, or a related technical field.

Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. The U.S. pay range for this position is $142,700 – $257,600 annually. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience. Your recruiter can share more about the specific salary range for the job location during the hiring process.

At Adobe, for sales roles starting salaries are expressed as total target compensation (TTC = base + commission),…


Increase/decrease your Search Radius (miles)



Job Posting Language