Principal Site Reliability Engineering specialist; SRE Job Montréal area,Montreal Province de Québec Canada,IT/Tech

Position: Principal Site Reliability Engineering specialist (SRE)
Location: Montreal

Position

Description:

Principal Site Reliability Engineer (SRE)

Location:

Montreal

Languages:

Bilingual (French & English)

We are hiring a Principal Site Reliability Engineering specialist (SRE) to support the design, evolution, and operation of mission critical technology platforms. In this strategic and handson role, you will lead the adoption of SRE best practices, shape cloud and application architectures, and drive the reliability, performance, and availability of client services. You will influence engineering standards, strengthen operational excellence, and collaborate across development, operations, security, and business teams to deliver resilient, scalable, and modern cloud solutions.

Who Are You?
You are an experienced SRE professional with deep technical expertise and a strong ability to improve reliability excel in cloud environments, automation, observability, and resilient architectures. You communicate effectively with technical and business stakeholders, collaborate naturally across teams, and consistently drive continuous improvement. Your balanced judgment and hands on leadership make you a trusted advisor in delivering highly reliable, high performing services.

Your future duties and responsibilities:

Architecture & Reliability

• Recommend reliability focused solutions based on business and technical needs.

• Define and influence cloud and application architectures aligned with performance, availability, and resilience goals.

• Implement and continuously improve SLIs, SLOs, and SLAs across critical services.

• Build, enhance, and maintain monitoring, logging, and alerting capabilities.
Automation & Observability

• Develop and improve observability frameworks (monitoring, alerting, logging).

• Automate operational and reliability processes using Python, Bash, Ansible, and cloud native tooling.

• Integrate reliability automation into CI/CD pipelines and optimize delivery workflows.
Incident Management & Continuous Improvement

• Lead major incident response, root cause analysis, and post mortem activities.

• Reduce incident frequency and improve service reliability through systemic enhancements.

• Drive adoption of SRE best practices across teams and contribute to organizational maturity.
Collaboration & Technical Leadership

• Partner with development, Dev Ops, architecture, security, and business stakeholders.

• Act as a technical authority and trusted advisor on service reliability.

• Promote knowledge sharing and foster continuous improvement in engineering practices.

Required qualifications to be successful in this role:

• Bachelor’s degree in Computer Science, Software Engineering, or related field—or equivalent experience.

• Bilingual (French/English)

• 5+ years of experience in SRE, Dev Ops, operations, or distributed systems.

• Strong experience with cloud platforms (AWS, Azure, or GCP) and modern architectural patterns.

• Proficiency in Linux, automation scripting (Python, Bash), and Infrastructure as Code (Terraform, Cloud Formation).

• Experience with Docker, Kubernetes, and container orchestration.

• Hands on expertise with observability tools (Datadog, Dynatrace, Prometheus, Splunk, New Relic).

• Demonstrated success improving system reliability and reducing operational incidents.

• Strong analytical, communication, and problem solving skills.

• Ability to influence stakeholders and provide strategic technical guidance.

• French proficiency required;
English proficiency considered an asset or required based on client context.
________________________________________
Skills

• Core: SRE, Dev Ops, Incident Management, Observability, SLIs/SLOs/SLAs

• Cloud: AWS / Azure / GCP

• Infrastructure:
Linux, Terraform, Cloud Formation

• Automation:
Python, Bash, Ansible

• Containers:
Docker, Kubernetes

• CI/CD:
Design, integration, automation

•

Soft Skills:

Collaboration, communication, advisory influence, problem solving

CGI is providing a reasonable estimate of the pay range for this role. The determination of this range includes factors such as skill set level, geographic market, experience and training, and licenses and certifications. Compensation decisions depend on the facts and circumstances of each case. A…


Increase/decrease your Search Radius (miles)



Job Posting Language