Production Support and Site Reliability Engineer; SRE Job Toronto area,Ontario Canada,IT/Tech

Position: Production Support and Site Reliability Engineer (SRE)

Responsibilities

Manage day-to-day production support activities for both web and mobile applications (Android and iOS).
Maintain overall health, stability, and safety of production systems and applications.
Identify operational risks and recommend mitigation strategies.
Improve application instrumentation, logging, alerting, and monitoring capabilities.

Change & Release Management

Perform change management activities across test and production environments.
Execute code deployments while adhering to source code management, release management, and compliance policies.
Ensure proper governance and documentation of all deployment activities.

Collaboration & Stakeholder Engagement

Work closely with development teams and business partners to recommend solutions that combine internal development, integration with other applications, and vendor platforms.
Thrive in an agile environment by contributing to sprint activities and collaborative planning.
Communicate effectively with team members, management, infrastructure teams, and other interface groups throughout the project lifecycle.
Develop strong understanding of business processes and enterprise systems.
Provide coaching, expertise, and continuous feedback to help build the team’s capability.
Share technical knowledge to support onboarding and skill growth.

Support & On‑Call Availability

Participate in occasional weekend and after‑hours support for critical issues or deployments.

Required Qualifications (Must‑Have)

Technical Troubleshooting & Monitoring
Hands‑on experience troubleshooting application and database issues using:
- Elastic / Kibana
- Mongo
  
  DB services running on Linux
- IIS Web Servers on Windows
- Kafka (basic to intermediate knowledge)
Strong proficiency with Following Database & Application Technologies:
- Solid ability to write, read, and troubleshoot SQL queries.
- Knowledge of SQL database architecture, performance monitoring, and optimization.
Good understanding of Following Operating Systems:
- Windows
- Linux
Automation & Infrastructure
- Experience automating routine database or infrastructure operations.
- Proficiency working with cloud‑hosted applications and services
Dev Ops & SRE Practices
- Experience with Dev Ops and Site Reliability Engineering tools such as:
  Helios, Urban Code Deploy (UCD), Jenkins, Ansible
- Knowledge of CI/CD pipelines, release workflows, and automation strategies.
Needs

Experience with :
- Monitoring tools such as Catchpoint and Aternity.
Productivity & Support Tools
- Jira and Confluence for project/task management.
- Firebase, Google Play Console, and Google Analytics for Android apps.
- Apple App Store experience for iOS application operations.
Soft Skills & Frameworks
- Strong analytical, problem‑solving, and decision‑making skills.
- Solid understanding of ITIL service management practices.
- Experience using Service Now for incident, problem, and change management.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language