ML Ops Engineer; hybrid or remote Job Canada,IT/Tech

Position: ML Ops Engineer (hybrid or remote)
Our Data Science team are a highly motivated and curious group. They're spearheading Achievers' efforts to build products powered by AI and enjoy solving all the problems that come with building don't operate under a rigid structure, as a member of this team, you'll have the opportunity to shape the work and the craft. We're in search of a skilled and driven ML Ops engineer who can support the full operational lifecycle of both traditional machine learning systems and emerging generative AI driven applications.

This role spans infrastructure, automation, quality and reliability engineering, with an emphasis on enabling scalable training, evaluation, deployment, and monitoring for a wide range of ML and GenAI workloads, including managing model upgrades, framework versions, regression testing, maintenance tasks and maintaining performance across systems and solutions.

Why you'll love this role:

Lead high-impact initiatives that shape how millions of people experience work around the world.

Bring your unique perspective to complex and challenging projects - apply your expertise in data science, influence technical direction, and share your knowledge with fellow team members.

Join a close-knit, no-ego, high-performing team that solves meaningful problems and celebrates successes together.

Work alongside an experienced leadership team who is genuinely invested in your career growth.

Thrive in a fast-paced, high-growth environment where innovation is encouraged and your voice truly matters.

How you'll shape ML Ops at Achievers:

This role will work extensively with Google Cloud’s AI/ML ecosystem, including Vertex AI (ML and GenAI), managed pipelines, vector databases, embeddings workflows, and model optimization tools.

Model Deployment & Serving (ML + GenAI)

Deploy and operate ML models and LLMs using Vertex AI, Cloud Run, and GKE.

Automate packaging, versioning, and release of models, prompts, embeddings, and related artifacts.

Design scalable inference architectures (sync, async, agentic), including batching and GPU/TPU autoscaling.

Pipeline Engineering & Automation

Build and maintain ML and GenAI workflows using Vertex AI Pipelines, Cloud Composer (Airflow), or custom orchestration.

Implement CI/CD for ML code and GenAI artifacts (prompts, fine-tuned models, evaluation suites).

Add automated validation for data quality, model performance, regression, and LLM evaluation metrics.

Implement quality gates in production pipelines, imagining and implementing tests that will gate deployment changes and identify production issues.

Schedule retraining, re-embedding, and re-indexing to ensure model freshness.

GenAIOps & Artifact Lifecycle

Manage and version prompts, system instructions, RAG components, and agent workflows.

Operationalize fine-tuned or custom models using Vertex AI tuning capabilities.

Implement safety guardrails, filtering, and approval workflows for generative systems.

Enable experimentation across prompts, models, and RAG strategies.

Cloud Infrastructure & Reliability

Build scalable training and inference environments using GCP services (Vertex AI, Big Query ML, Dataflow/Dataproc, Cloud Storage, Cloud Run/GKE).

Manage infrastructure as code using Terraform or Deployment Manager.

Apply cost optimization, reliability, and scaling best practices.

Observability, Monitoring & Governance

Monitor model, data, and embedding drift.

Track LLM-specific metrics (latency, cost, prompt performance, safety triggers).

Implement logging, lineage, and metadata using Vertex ML Metadata and Cloud Logging.

Embed AI governance controls (explainability, bias, performance, data usage).

Support audit-ready workflows with model cards, prompt cards, and evaluation documentation.

Align operational practices with emerging external AI regulations and frameworks (, responsible AI, model risk management, audit readiness).

Partner with security, legal, privacy, and risk teams to operationalize AI governance without slowing experimentation.

Cross-Functional Collaboration

Partner with data scientists, GenAI engineers, product managers, and engineers to deliver production-ready ML systems.

Promote best practices for reliable, scalable, and governed ML and GenAI operations.

Experience we feel will set you up for success:

Experience in ML Ops, ML platform engineering, or cloud-based AI infrastructure.

Strong hands-on cloud experience (GCP preferred but not required), especially Vertex AI (ML & GenAI), Big Query/Big Query ML, Cloud Run or GKE, and Cloud Composer.

Strong Python skills with experience in testing, CI/CD, containerization, and infrastructure automation (Terraform).

Experience with LLM workflows: embeddings, vector databases, prompt engineering, and evaluation.

Exposure to agentic workflows and frameworks such as MCP.

Familiarity with Vertex AI Model Garden, tuning, monitoring, and vector search technologies.

Exposure to LLM safety, moderation, or red-teaming workflows.

Soft Skills

Strong communication and cross-functional collaboration skills.

Detail-oriented,…


Increase/decrease your Search Radius (miles)



Job Posting Language