Senior ML Platform/ML Infrastructure Engineer II Job Toronto area,Ontario Canada,IT/Tech

Position: Senior ML Platform / ML Infrastructure Engineer II

Mistplay est l'application de fidélité n°1 pour les joueurs mobiles. Notre communauté de millions de joueurs mobiles engagés utilise Mistplay pour découvrir de nouveaux jeux et gagner des récompenses. Les joueurs sont récompensés pour le temps et l'argent qu'ils consacrent aux jeux et peuvent échanger ces récompenses contre des cartes cadeaux. Mistplay a pour mission d'être le meilleur moyen de jouer à des jeux mobiles pour tous, partout dans le monde !

Téléchargez Mistplay sur le Google Play Store ici et suivez-nous sur Instagram, Twitter et Facebook.

📍

Please Note:

In Canada 🇨🇦, Mistplay follows a 2 days/week in-office hybrid model in Toronto (400 University Ave) & Montreal (1001 Blvd. Robert-Bourassa)

English Description is Below ⬇️

What you’ll do

Design, build, and operate standardized training-to-serving pipelines with Airflow, covering artifact management, environment provisioning, packaging, deployment, and rollback for Sage Maker endpoints.
Own real-time and batch inference on Sage Maker: multi-model endpoints, serverless inference where appropriate, blue/green and canary strategies, autoscaling policies, and cost controls (spot strategies, instance right-sizing).
Implement ultra-low-latency serving patterns with Redis/Valkey: feature caching, online feature retrieval, request-scoped state, model response caching, and rate limiting/back pressure for bursty traffic.
Provision and manage ML/data infrastructure with Terraform:
Sage Maker endpoints/configs, ECR/ECS/EKS resources, networking/VPC endpoints, Elasti Cache/Valkey clusters, observability stacks, secrets, and IAM.
Build platform abstractions and golden paths:
Airflow DAG templates, CLI/SDKs, cookie-cutter repos, and CI/CD pipelines that take models from notebooks to production predictably.
Establish and run model lifecycle governance: model/feature registries, approval workflows, promotion policies, lineage, and audit trails integrated with Airflow runs and Terraform state.
Implement end-to-end observability: data/feature freshness checks, drift/quality gates, model performance/latency SLOs, infra health dashboards, tracing, alerting, incident response and postmortems.
Partner with Security, SRE, and Data Engineering on private networking, policy-as-code, PII handling, least-privilege IAM, and cost-efficient architectures across environments.
Evaluate, integrate, and rationalize platform tooling (e.g., MLflow registry, feature stores, serving gateways); lead migrations with clear change management and minimal downtime.

What you’ll bring

5+ years building and operating production-grade ML/data platforms with a focus on serving, reliability, and developer experience.
Strong software engineering in Python, Go, or Java; experience building resilient services, APIs, and automation tooling with high test coverage.
Deep experience with AWS Sage Maker inference: endpoint configuration, containerization, model packaging, autoscaling, serverless vs real-time trade-offs, MME, A/B and canary releases.
Expertise with online feature stores like Redis/Valkey in ML serving contexts.
Proven Terraform experience managing ML and data infra end-to-end: modules, work spaces, drift detection, change reviews, safe rollbacks, and familiarity with Git Ops patterns.
Airflow orchestration at scale: dependency modeling, sensors, retries, SLAs, backfills, DAG factories, integrations with registries, artifact stores, and Terraform pipelines.
Familiarity with ML frameworks (scikit-learn, XGBoost, PyTorch, Tensor Flow) from a platform‑integration perspective to support diverse runtimes and containers.
Observability for ML workflows: metrics, logs, traces, performance profiling, capacity planning, cost monitoring, and runbooks.
Excellent communication and cross‑functional collaboration with Data Science, Data Engineering, Dev Ops and Backend.

Nous remercions tous(tes) les candidat(e) s. Le genre masculin a été utilisé dans le but d’alléger le texte. Nous souscrivons au principe de l’équité en matière d’emploi.

Why Mistplay?

We strive to make our work environment as inviting and fun as possible! Working at Mistplay is coupled with a whole array of perks that we've adopted virtually and…


Increase/decrease your Search Radius (miles)



Job Posting Language

Senior ML Platform​/ML Infrastructure Engineer II

Senior ML Platform/ML Infrastructure Engineer II