×
Register Here to Apply for Jobs or Post Jobs. X

Senior ML Platform​/ML Infrastructure Engineer II

Job in Toronto, Ontario, C6A, Canada
Listing for: Mistplay
Full Time, Part Time position
Listed on 2026-02-28
Job specializations:
  • IT/Tech
    Data Engineer, Systems Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 CAD Yearly CAD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Senior ML Platform / ML Infrastructure Engineer II

Mistplay est l'application de fidélité n°1 pour les joueurs mobiles. Notre communauté de millions de joueurs mobiles engagés utilise Mistplay pour découvrir de nouveaux jeux et gagner des récompenses. Les joueurs sont récompensés pour le temps et l'argent qu'ils consacrent aux jeux et peuvent échanger ces récompenses contre des cartes cadeaux. Mistplay a pour mission d'être le meilleur moyen de jouer à des jeux mobiles pour tous, partout dans le monde !

Téléchargez Mistplay sur le Google Play Store ici et suivez-nous sur Instagram, Twitter et Facebook.

📍

Please Note:

In Canada 🇨🇦, Mistplay follows a 2 days/week in-office hybrid model in Toronto (400 University Ave) & Montreal (1001 Blvd. Robert-Bourassa)

English Description is Below ⬇️

What you’ll do
  • Design, build, and operate standardized training-to-serving pipelines with Airflow, covering artifact management, environment provisioning, packaging, deployment, and rollback for Sage Maker endpoints.
  • Own real-time and batch inference on Sage Maker: multi-model endpoints, serverless inference where appropriate, blue/green and canary strategies, autoscaling policies, and cost controls (spot strategies, instance right-sizing).
  • Implement ultra-low-latency serving patterns with Redis/Valkey: feature caching, online feature retrieval, request-scoped state, model response caching, and rate limiting/back pressure for bursty traffic.
  • Provision and manage ML/data infrastructure with Terraform:
    Sage Maker endpoints/configs, ECR/ECS/EKS resources, networking/VPC endpoints, Elasti Cache/Valkey clusters, observability stacks, secrets, and IAM.
  • Build platform abstractions and golden paths:
    Airflow DAG templates, CLI/SDKs, cookie-cutter repos, and CI/CD pipelines that take models from notebooks to production predictably.
  • Establish and run model lifecycle governance: model/feature registries, approval workflows, promotion policies, lineage, and audit trails integrated with Airflow runs and Terraform state.
  • Implement end-to-end observability: data/feature freshness checks, drift/quality gates, model performance/latency SLOs, infra health dashboards, tracing, alerting, incident response and postmortems.
  • Partner with Security, SRE, and Data Engineering on private networking, policy-as-code, PII handling, least-privilege IAM, and cost-efficient architectures across environments.
  • Evaluate, integrate, and rationalize platform tooling (e.g., MLflow registry, feature stores, serving gateways); lead migrations with clear change management and minimal downtime.
What you’ll bring
  • 5+ years building and operating production-grade ML/data platforms with a focus on serving, reliability, and developer experience.
  • Strong software engineering in Python, Go, or Java; experience building resilient services, APIs, and automation tooling with high test coverage.
  • Deep experience with AWS Sage Maker inference: endpoint configuration, containerization, model packaging, autoscaling, serverless vs real-time trade-offs, MME, A/B and canary releases.
  • Expertise with online feature stores like Redis/Valkey in ML serving contexts.
  • Proven Terraform experience managing ML and data infra end-to-end: modules, work spaces, drift detection, change reviews, safe rollbacks, and familiarity with Git Ops patterns.
  • Airflow orchestration at scale: dependency modeling, sensors, retries, SLAs, backfills, DAG factories, integrations with registries, artifact stores, and Terraform pipelines.
  • Familiarity with ML frameworks (scikit-learn, XGBoost, PyTorch, Tensor Flow) from a platform‑integration perspective to support diverse runtimes and containers.
  • Observability for ML workflows: metrics, logs, traces, performance profiling, capacity planning, cost monitoring, and runbooks.
  • Excellent communication and cross‑functional collaboration with Data Science, Data Engineering, Dev Ops and Backend.

Nous remercions tous(tes) les candidat(e) s. Le genre masculin a été utilisé dans le but d’alléger le texte. Nous souscrivons au principe de l’équité en matière d’emploi.

Why Mistplay?

We strive to make our work environment as inviting and fun as possible! Working at Mistplay is coupled with a whole array of perks that we've adopted virtually and…

Position Requirements
10+ Years work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary