Sr Site Reliability Engineering Analyst
Listed on 2026-02-28
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Essential Functions / Responsibilities
The Sr. Site Reliability Engineering (SRE) Analyst works under limited supervision to enhance system reliability, resiliency, and performance. This role leads initiatives that reduce Mean Time to Awareness/Resolve (MTTA/MTTR), improves observability and automation, strengthens operational best practices, and partners with engineering teams to embed reliability into design, deployment, and operations.
Reliability Engineering & Systems Improvement- Lead reliability and performance improvements, including capacity planning, failover strategies, and MTTA/MTTR reduction.
- Develop technical solutions for complex system issues and resilience gaps.
- Assess reliability risks and recommend enhancements to ensure service continuity.
- Refine and promote best practices for reliability, maintainability, and scalability.
- Mentor team members and provide technical guidance.
- Recommend engineering improvements that drive consistency and long‑term stability.
- Improve monitoring, alerting, and observability to strengthen system awareness.
- Support incident response and RCA activities to ensure effective resolution.
- Document incident learnings and share knowledge across teams supporting Agile Release Train(s).
- Partner with development, operations, and architecture teams to integrate reliability into system design and delivery.
- Reduce operational toil through automation and process optimization.
- Enhance engineering workflows, CI/CD pipelines, and readiness practices.
- Perform additional responsibilities as required to support organizational goals.
- Strong written and verbal communication skills.
- Ability to analyze complex technical problems and implement effective solutions.
- Solid understanding of distributed systems, cloud environments, and modern application architectures.
- Hands‑on experience with observability platforms (Dynatrace required).
- Experience with monitoring, incident management, and RCA practices.
- Ability to lead initiatives independently and collaborate across teams.
- Demonstrated focus on reliability, resiliency, automation, and continuous improvement.
- Development experience (e.g., Python, Java, scripting for automation).
- Cloud expertise (e.g., Azure, GCP) including deployment, architecture, and operations.
- Experience with AI/ML‑powered monitoring, automation, or incident prediction.
- Familiarity with SRE‑aligned frameworks such as SLIs/SLOs, error budgets, and reliability patterns.
Bachelor's Degree in Computer Science, Engineering, Information Systems and/or related field or equivalent.
Minimum ExperienceFive (5) or more years equivalent work experience in information technology or engineering environment. A related advanced degree may offset the experience requirements.
Knowledge, Skills, And AbilitiesStrong written and verbal communication skills.
Preferred Qualifications—
Pay TransparencyThis compensation range is provided as a reasonable estimate of the current starting salary range for this role across all potential locations. If this opportunity includes multiple job levels, the range is a reasonable estimate of the current starting salary for the lowest level to the current starting salary of the highest level. Actual starting pay would be determined by experience relative to the job, market level, pay at the location for this job and other job‑related factors permitted by law.
An employee may be eligible for additional pay, premiums, or bonus potential. The Company offers eligible employees health, vision, and dental insurance, retirement, and tuition reimbursement.
Pay: US: $7,094.23/mo - $15,607.31/mo, CO: $7,094.23/mo - $14,957.01/mo, MD: $7,488.36/mo - $15,607.31/mo, NJ: $7,488.36/mo - $15,607.31/mo, NY: $7,488.36/mo - $15,607.31/mo, NYC: $9,064.85/mo - $15,607.31/mo, WA: $7,488.36/mo - $15,607.31/mo
Additional DetailsThe compensation listed reflects the pay range or rate of pay reasonably expected for this posted position at the posted location or locations. If this opportunity includes…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).