Software Engineer - ECM SRE Job Gatineau area,Province de Québec Canada,IT/Tech

Job Description

As a software engineer on the Emergency Call Management site reliability engineering (ECM-SRE) team you will join a team of talented software engineers who work directly with product and engineering teams to constantly improve reliability across our suite of public safety products.

Your responsibilities will include:

Architecture and implementation of Monitoring/Observability objectives. This includes maintenance of Alert response playbooks.

Creation and reinforcement of the HA and reliability strategy.

Triage of customer-reported incidents and problems to the proper software team, requiring troubleshooting and problem management skills.

Maintenance and reporting of SLOs and error budget.

Facilitation of Chaos Engineering activities with multiple engineering teams.

Developing the SRE culture and sharing best practices across Motorola Solutions’ Emergency Call Management organization.

On-call support alongside multiple engineering teams for products and services in production. This role focuses on Incident Command to maintain focus and direction of the incident process. This is essential to meet regulatory reporting requirements.

Assist Motorola Solutions’ customer support teams in creating customer facing communication documents, requiring strong communication skills.

Facilitation of Failure Mode and Effects Analysis with multiple engineering teams.

The right individual will have a passion for observability, reliability, automation, incident response, and enabling innovation.

Qualifications:

BS in Computer Engineering (or equivalent degree)

4+ years of professional software development

Excellent communication skills

Experience developing cloud-based applications

Experience developing REST-based APIs and implementing microservice principles and architectures

Experience with modern Dev Ops tooling (including CI/CD pipelines)

Familiarity with the concepts involved in designing a high availability architecture

Familiarity with observability and monitoring

Familiarity with automated testing

Creativity and persistence when solving complex problems

Enthusiasm for learning key technologies, architectures, processes, and best practices

Preferred Skills

Familiarity with SRE or Dev Ops

Familiarity with container deployment and orchestration technologies at scale

Familiarity with SLOs and SLIs

Familiarity with incident response, disaster recovery, root cause analysis, and postmortems

Familiarity with IaC

Familiarity with chaos engineering

Familiarity with redundancy and failovers

Familiarity with capacity planning and load balancing

Familiarity with service mesh

Familiarity with feature flags, canary releases, or blue/green deployments

Familiarity with hybrid cloud architecture

Familiarity developing cloud-based applications with a multi-tenant database architecture

Familiarity with systems programming (network stack, file system, OS services) and networking (L2 vs. L3, network architecture, VLANs, etc)

Experience working in Agile teams leveraging Scrum, Kanban, or other methodologies and/or understanding of Agile development concepts

Experience being on-call for a product in production

Description du poste

En tant qu'ingénieur logiciel au sein de l'équipe d'ingénierie de fiabilité du site de gestion des appels d'urgence (ECM-SRE), vous rejoindrez une équipe d'ingénieurs logiciels talentueux qui travaillent directement avec les équipes de produits et d'ingénierie pour améliorer constamment la fiabilité de notre suite de produits de sécurité publique.

Vos responsabilités comprendront :

L'architecture et la mise en œuvre des objectifs de surveillance/observabilité. Cela inclut la maintenance des manuels d'intervention en cas d'alerte.

La création et le renforcement de la stratégie de haute disponibilité (HA) et de fiabilité.

Le triage des incidents et des problèmes signalés par les clients vers l'équipe logicielle appropriée, nécessitant des compétences en dépannage et en gestion des problèmes.

La maintenance et le reporting des SLOs (Objectifs de Niveau de Service) et du budget d'erreur.

La facilitation des activités d'ingénierie du chaos (Chaos Engineering) avec plusieurs équipes d'ingénierie.

Le développement de la culture SRE et…


Increase/decrease your Search Radius (miles)



Job Posting Language