Senior Site Reliability Engineer - Incident Management/Resiliency; Hybrid
Listed on 2026-01-14
-
IT/Tech
Systems Engineer, IT Support, Cybersecurity
Senior Site Reliability Engineer - Incident Management/Resiliency (Hybrid)
We are interested in every qualified candidate who is eligible to work in the United States. However, we are not able to sponsor visas or take over sponsorship at this time.
About Resilience EngineeringResilience Engineering is a subset of the Site Reliability Engineering team that strives to foster a culture of continuous improvement through incident analysis, process evolution, and problem-solving. We work closely with teams across Tech, Product, and Operations through our Production Incident process to uncover system weaknesses, learn from failures, and make our technology more reliable.
What You’ll Be DoingIn this role, you’ll play a key role in enhancing the resiliency of our systems. Your work will focus on our incident response, reporting and analysis processes, enabling the organization to better prepare for and respond to complex system failures. You’ll drive cross-department efforts to deliver reliable, resilient, and observable solutions at Enova.
Your core priorities will be to:- Lead production incidents as part of our PI PIC (or Incident Commander) rotation after completing training, ensuring clear communication and resolution.
- Capture and maintain detailed documentation of incidents, contributing factors, and learnings in formal incident reports.
- Deliver documentation that is clear, comprehensive, and accessible to different types of audiences in a timely manner within the established SLAs.
- Facilitate and document blameless post‑incident reviews that promote learning and continuous improvement.
- Collect and analyze incident data in order to identify systemic issues, risks, and trends. Lead incident data reviews in front of a wide range of stakeholders, including technical and business leadership.
- Work on improvements to how we collect, analyze, and learn from system failures.
- Champion a culture of operational excellence and resilience across the organization.
- Collaborate with engineering, product, and operations teams to address vulnerabilities and build more resilient systems.
- Design and run failure simulations (e.g., mock incidents, disaster recovery exercises) to proactively identify weak points.
- 5+ Experience in a technology or analyst role (e.g., Software Engineering, Systems, Operations, SRE, or Product).
- A strong interest in how complex distributed systems operate—and how to make them more reliable.
- Analytical and problem‑solving skills with a systems‑thinking mindset.
- Strong communication skills, both verbal and written, with the ability to tailor messaging to technical and non‑technical audiences.
- Experience querying and analyzing data (e.g., SQL, Postgre
SQL, Kafka). - Comfort with ambiguity, and the ability to turn vague problems into actionable insights.
- Demonstrated maturity, sound judgment, and organizational awareness.
- Ability to coordinate the resolution of major incidents and reviews following Enova Incident Management Process.
- Ability to seamlessly shift between high‑urgency incident response and structured project work, with strong organizational skills and the capacity to manage projects independently.
- Experience leading resolution of major system outages or production incidents.
- Experience driving large‑scale technical or process changes.
This position includes various levels within our career ladder. The actual annual salary will be determined based on qualifications, skills, experience, and level assessed during the hiring process and may fall outside of the ranges shown.
Budgeted annual salary ranges:
Additional compensation for this role may include a bonus. All full‑time employees are eligible to participate in Company benefits, described in more detail here.
- Our hybrid roles require in‑office work Tuesday through Thursday, with remote flexibility on Mondays and Fridays. This schedule fosters collaboration, team connection, and strategic planning, enhancing communication and effectiveness to drive results.
- Health, dental, and vision insurance including mental health benefits
- 401(k) matching plus a roth option (U.S. Based employees only)
- PTO & paid holidays off
- Sabba…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).