Cloud MLOps Engineer
Listed on 2026-01-12
-
IT/Tech
Data Engineer, Machine Learning/ ML Engineer
Responsibilities
- Build & Automate ML Pipelines:Design, implement, and maintain CI/CD pipelines for machine learning models, ensuring automated data ingestion, model training, testing, versioning, and deployment.
- Operationalize Models:Collaborate closely with data scientists to containerize, optimize, and deploy their models to production, focusing on reproducibility, scalability, and performance.
- Infrastructure Management:Design and manage the underlying cloud infrastructure (AWS) that powers our MLOps platform, leveraging Infrastructure-as-Code (IaC) tools to ensure consistency and cost optimization.
- Monitoring & Observability:Implement comprehensive monitoring, alerting, and logging solutions to track model performance, data integrity, and pipeline health in real-time. Proactively address issues like model or data drift.
- Governance & Security:Establish and enforce best practices for model and data versioning, auditability, security, and access control across the entire machine learning lifecycle.
- Tooling & Frameworks:Develop and maintain reusable tools and frameworks to accelerate the ML development process and empower data science teams.
- Experience:Overall 10+ years of experience with 4+ years of experience in MLOps, Machine Learning Engineering, or a related Dev Ops role with a focus on ML workflows.
- Cloud Expertise:Extensive hands-on experience in designing and implementing MLOps solutions on
AWS
. Proficient with core services like
Sage Maker
, S3, ECS, EKS, Lambda, SQS, SNS, and IAM. - Coding & Automation:Strong coding proficiency in
Python
. Extensive experience with automation tools, including
Terraform
for IaC and
Git Hub Actions
. - MLOps & Dev Ops:A solid understanding of MLOps and Dev Ops principles. Hands-on experience with MLOps frameworks like
Sage Maker Pipelines
, Model Registry, Weights and Bias,
MLflow
or
Kubeflow
and orchestration tools like
Airflow
or
Argo Workflows
. - Containerization:Expertise in developing and deploying containerized applications using
Docker
and orchestrating them with
ECS and EKS
. - Model Lifecycle:Experience with model testing, validation, and performance monitoring. Good understanding of ML frameworks like PyTorch or Tensor Flow is required to effectively collaborate with data scientists.
- Communication:Excellent communication and documentation skills, with a proven ability to collaborate with cross-functional teams (data scientists, data engineers, and architects).
Cloud Hybrid is an equal opportunity employer inclusive of female, minority, disability and veterans, (M/F/D/V). Hiring, promotion, transfer, compensation, benefits, discipline, termination and all other employment decisions are made without regard to race, color, religion, sex, sexual orientation, gender identity, age, disability, national origin, citizenship/immigration status, veteran status or any other protected status. Cloud Hybrid will not make any posting or employment decision that does not comply with applicable laws relating to labor and employment, equal opportunity, employment eligibility requirements or related matters.
Nor will Cloud Hybrid require in a posting or otherwise U.S. citizenship or lawful permanent residency in the U.S. as a condition of employment except as necessary to comply with law, regulation, executive order, or federal, state, or local government contract.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).