Site Reliability Engineer - Cloud, Dynatrace
Job in
Toronto, Ontario, C6A, Canada
Listed on 2026-02-28
Listing for:
Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!
Full Time
position Listed on 2026-02-28
Job specializations:
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Job Description & How to Apply Below
Essential Skills
- Amazon Web Service (AWS) Cloud Computing
- Github Enterprise
- Site Reliability Engineer (SRE)
- Amazon Web Service (AWS) Cloud Computing
- Github Enterprise
- Site Reliability Engineer (SRE) with expertise in Dynatrace monitoring, log investigation and observability practices.
- The ideal candidate will have a deep understanding of business processes, upstream-downstream dependencies and the ability to design and implement dashboards with SLOs and SLAs that align with business objectives.
- Monitoring Observability
- Configure and maintain Dynatrace for application and infrastructure monitoring.
- Develop custom dashboards, alerts and reports to track system health and performance.
- Define and implement Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
- Log Analysis
- Troubleshooting
- Perform log investigation using tools like Splunk, ELK or similar platforms.
- Identify root causes of incidents and provide actionable insights for resolution.
- Business Understanding.
- Analyze business models, workflows and critical application flows.
- Map upstream and downstream dependencies to ensure end-to-end reliability.
- Incident Management
- Participate in on-call rotations and respond to production incidents.
- Drive post-incident reviews and implement preventive measures.
- Automation Optimization
- Automate monitoring and alerting processes to reduce manual intervention.
- Collaborate with development teams to improve system reliability and performance.
Skills and Qualifications
- Technical Expertise
- Strong experience with Dynatrace (configuration, dashboards and problem detection).
- Proficiency in log analysis tools (Splunk, ELK or equivalent).
- Solid understanding of SRE principles| observability| and incident management.
- Business Analytical Skills
- Ability to understand business processes and translate them into technical monitoring solutions.
- Experience in mapping application dependencies and creating impact analysis.
- Excellent communication and collaboration skills.
- Strong problem-solving and analytical mindset.
- Experience with Cloud platforms (AWS, Azure, GCP).
- Familiarity with CI/CD pipelines and automation scripting.
- Performance Metrics Uptime and reliability improvements.
- Reduction in Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).
- Accuracy and relevance of dashboards and alerts.
- Compliance with defined SLOs and SLAs.
8-10 years
#J-18808-LjbffrNote that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×