Get AI‑powered advice on this job and more exclusive features.
About the Role We’re hiring a Staff Platform Engineer to establish Rover’s reliability foundation — from infrastructure to observability, from message delivery guarantees to zero‑downtime deploys. This is a hands‑on role. You’ll be our first dedicated platform/SRE hire, working alongside our backend engineers to design and operate systems that can process tens of thousands of fan messages per second without fail.
You’ll own uptime, scalability, and developer experience — setting the technical and cultural bar for how reliability is built into everything we ship. Over time, you’ll help grow and lead a small Platform team focused on reliability, infrastructure, and developer productivity.
- Design and implement Rover’s core infrastructure on Google Cloud (GKE, Cloud Build, Terraform, Helm).
- Build and own CI/CD pipelines with canary and blue‑green deployments.
- Define and enforce SLOs, SLIs, and error budgets for all critical services.
- Architect and maintain the messaging pipeline (Kafka, Postgres, Express microservices) for reliability, scale, and idempotency.
- Instrument end‑to‑end observability using Prometheus, Grafana, and Open Telemetry.
- Implement infrastructure‑as‑code and secrets management best practices.
- Partner with backend and frontend teams to design services that are fault‑tolerant, observable, and cost‑efficient.
- Lead incident management, postmortems, and on‑call rotations.
- Build internal tooling and “paved roads” that make developers faster and safer.
- Mentor engineers on reliability principles and production readiness.
- 5+ years of backend, infrastructure, or SRE experience in production SaaS environments.
- Strong experience with Type Script/Node, Express, Postgres, Kafka, and Kubernetes.
- Deep understanding of distributed systems, high‑throughput messaging, and idempotent design.
- Strong practical experience with Google Cloud, Docker, Terraform, Helm, and CI/CD pipelines (Git Hub Actions + Cloud Build).
- Hands‑on experience building observability stacks using Prometheus, Grafana, and Open Telemetry.
- Comfortable owning uptime, reliability, and on‑call responsibilities from day one.
- Clear, confident communicator who can partner across teams and later help build and mentor a small Platform group.
- Bonus: experience in high‑throughput communications (SMS, push, notifications, or similar).
- Build technology trusted by some of the most iconic sports teams in the world.
- Work on large‑scale systems that combine real‑time data, messaging, and fan engagement.
- Join a senior, product‑minded engineering culture focused on quality and autonomy.
- Competitive compensation, equity, and hybrid work environment in downtown Toronto.
Referral policy:
Referrals increase your chances of interviewing at Rover by 2x.
Toronto, Ontario, Canada — Salary: CA $50–CA $50 (annually). Posted 2 weeks ago.
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: