×
Register Here to Apply for Jobs or Post Jobs. X

Senior ML Platform Engineer

Job in Toronto, Ontario, M5A, Canada
Listing for: Rakuten Kobo
Full Time position
Listed on 2026-01-13
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer, Data Engineer, Cloud Computing
Job Description & How to Apply Below

The Role

Senior ML Platform Engineer (MLOps)

Rakuten Kobo Inc. is seeking a
visionary and highly skilled Senior ML Platform Engineer
to
architect, build, and lead the evolution of our internal Machine Learning Platform and MLOps capabilities.In this pivotal role, you will define the strategic roadmap and hands-on implementation for a
state-of-the-art, fully automated ML framework on the Google Cloud Platform (GCP).

You will be instrumental in
designing and developing the core infrastructure, tools, and services that empower our Data Scientists and ML Engineers
to efficiently develop, deploy, monitor, and manage their Machine Learning models throughout their lifecycle. Collaborating closely with Data Scientists, Data Engineers, Platform Engineers, and business stakeholders, you will transform manual ML production processes into a seamless, scalable, and reproducible ML Platform.

This groundbreaking position is dedicated to
streamlining the entire ML project lifecycle by providing a robust, self-service platform
, ensuring the continuous delivery of significant business value through innovative Machine Learning solutions. Success in this role demands not only profound ML engineering and platform-building expertise but also a strategic, forward-thinking mindset for seamlessly integrating ML/AI into the core of our engineering practices at scale.

Experience and Background:

  • 8+ years of professional experience
    in ML Engineering or related fields, with a significant portion dedicated to ML Platform development.
  • Proven experience leading the design, development, and implementation of a custom ML Platform
    or significant MLOps infrastructure for an organization. This is the
    most crucial
    must-have.
  • Deep expertise in MLOps tools and their integration into a platform
    , including:
    Orchestration:Kubeflow, Airflow, Argo Workflows, Step Functions, Vertex AI Pipelines.
    Experiment Tracking & Model Registry:MLflow, DVC, Vertex AI ML Metadata, Sage Maker Experiments/Model Registry.
    Model Monitoring & Observability:Prometheus, Grafana, Arize, Sagemaker Model Monitor, Vertex AI Model Monitoring.
    Data/Model Versioning:DVC, Git-LFS, internal systems.
    Feature Stores:Feast, Hops-works, or custom-built.
    CI/CD for ML:Jenkins, Git Hub Actions, Git Lab CI, Build Kite, ArgoCD (Git Ops).
    Containerization & Orchestration:Docker, Kubernetes, Helm.
  • Strong proficiency in Python.
  • Extensive Cloud Experience, with a strong preference for GCP.This includes hands-on experience with GCP MLOps services (Vertex AI, Dataflow, Big Query ML, Cloud Build, GKE, Cloud Composer).
  • Experience moving companies from manual to automated processes
    at scale, particularly in the context of ML development and deployment.
  • Demonstrated Seniority:Ability to lead projects, make architectural decisions, mentor junior engineers, and influence technical strategy. This includes communicating complex technical concepts to non-technical stakeholders.
  • Solid understanding of ML fundamentals
    (predictive modeling, deep learning, GenAI/LLMs are a plus but secondary to platform expertise).
  • The Skillset:

    Strong hands-on experience with GCP tools such as:

  • Vertex AI
  • Big Query
  • Cloud Storage
  • Cloud Composer / Airflow
  • Cloud Build and Cloud Deploy Cloud Functions
  • MLOps framework and Automation:

  • Strong understanding of data ingestion pipelines and experiment tracking tools.
  • Ability to enforce reproducibility and lineage tracking.
  • Familiarity with Kubeflow and/or TFX
  • Proven ability to design and implement CI/CD pipeline for ML (automated training, testing, and deployment, integration with Git Hub or Cloud Build)
  • Experience with model versioning and registry (Vertex AI Model Registry)
  • Knowledge of Feature Store design.
  • Ability to setup automated monitoring for data and model drift, model performance.
  • Experience setting up observability stacks (logging, metrics, alerts, model health dashboards).
  • Software Engineering and Dev Ops:

  • Proficiency in Python (mandatory), familiarity with R/Scala/Java as needed.
  • Experience with containerization (docker) and orchestration (Kubernetes, GKE)
    Strong background in infrastructure-as-a-code (Terraform, Deployment Manager)
  • Ability to implement unit tests, integration tests, and ML-specific…
  • Position Requirements
    10+ Years work experience
    Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
    To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
     
     
     
    Search for further Jobs Here:
    (Try combinations for better Results! Or enter less keywords for broader Results)
    Location
    Increase/decrease your Search Radius (miles)

    Job Posting Language
    Employment Category
    Education (minimum level)
    Filters
    Education Level
    Experience Level (years)
    Posted in last:
    Salary