Manager,Site Reliability Engineering; SRE Job Toronto area,Ontario Canada,IT/Tech

Position: Manager, Site Reliability Engineering (SRE)
#
** Our Privacy Statement & Cookie Policy
** Manager, Site Reliability Engineering (SRE) page is loaded## Manager, Site Reliability Engineering (SRE) remote type:
Hybrid locations:
Canada, Toronto, Ontario time type:
Full time posted on:
Posted Todaytime left to apply:
End Date:
April 27, 2026 (30+ days left to apply) job requisition :
JREQ
198401

New Position:
This position is open due to an existing vacancy to support our evolving business needs.
*
* About the Role:

** In this opportunity as
** Site Reliability Engineering Manager**, you will be responsible for:
* Team Leadership:
Lead and mentor a team of SREs, providing technical guidance, coaching, and support to foster a culture of collaboration, innovation, and continuous improvement.
* Strategic Vision and Planning:
Develop and implement a strategic vision for the SRE team to align with organizational goals and drive continuous improvements in reliability and performance.
* Performance Metrics and Reporting:
Establish and monitor key performance indicators (KPIs) to measure the success of SRE initiatives and communicate results and insights to stakeholders.
* Operational Excellence:
Drive the implementation of best practices for reliability, scalability, and performance across our systems and services.

* Risk Management:

Proactively identify potential risks to service reliability and develop strategies to mitigate these risks, ensuring business continuity and resilience.
* System Architecture:
Collaborate with cross-functional teams to design, build, and maintain scalable and resilient architectures for our cloud-based infrastructure and applications. Identify opportunities for optimization and efficiency improvements. Solve intractable problems and devising solutions to improve the products and services we offer our customers.
* Dev Ops Practices:
Promote and implement Dev Ops principles and practices to streamline software delivery, automate infrastructure provisioning, and improve deployment processes. Collaborate with development teams to integrate SRE practices into the software development lifecycle.
* Automation and Tooling:
Champion the use of automation and tooling to streamline operational workflows, increase efficiency, and reduce manual toil. Drive the development of monitoring, alerting, and automation solutions to proactively identify and remediate issues.
* Continuous Improvement:
Promote a culture of continuous improvement by fostering innovation, experimentation, and learning within the team. Encourage knowledge sharing and professional development to enhance technical skills and expertise.
*
* About You:

** You’re a potential fit for the role of
** Site Reliability Manager
** if your background includes:
* 5+ years’ experience in a leadership role, managing a team of Dev Ops engineers and/or Site Reliability engineers or related technical professionals.
* Bachelor’s degree or equivalent required, Computer Science or related technical degree preferred.
* 5-10 years of relevant experience in software development and/or technology platform, infrastructure, or operations.
* Hands-on experience with programming and scripting languages.
* Strong people management skills to effectively lead, motivate, and develop team members, including conducting performance evaluations and providing constructive feedback to drive continuous improvement and team success.
* Experience with AI/ML tools to help improve service, reduce costs, and worked with AI-Operations solutions.
* You have experience with cloud technologies, services, use of their APIs. (e.g., AWS, Azure, GCP).
* Proficiency in Dev Ops practices and methodologies, with hands-on experience in CI/CD pipelines, configuration management, and infrastructure as code Infrastructure as Code (IaC) tools such as Terraform and Bicep.
* Familiarity with programming languages such as Python, Java, C#.
* Experience designing and supporting scalable systems and services.
* Experience in leading release management processes, ensuring successful software releases by coordinating with cross-functional teams and overseeing the deployment, monitoring, and maintenance of new features and updates.
* Proficiency in Observability…


Increase/decrease your Search Radius (miles)



Job Posting Language

Manager, Site Reliability Engineering; SRE