Manager,Incident Ops and Observability Job Seattle area,Washington USA,IT/Tech

About F5

At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do centers around people, obsessing over how to make the lives of our customers—and their customers—better.

We prioritize a diverse F5 community where each individual can thrive.

Position Summary

We are seeking a manager to help build our new Site Reliability Engineering team to strengthen operational excellence across the Infrastructure & Security and F5 Digital organization. This role will play an important part in Digital’s incident management strategy, building out the Reliability Operations Center and monitoring capabilities and technologies required to help Digital understand problems before our users do. The ideal candidate will bring deep expertise in incident lifecycle management—from detection and triage to resolution and post-mortem—and will collaborate cross-functionally to drive continuous improvement in our security posture.

This leader will operationalize a world-class incident management program while also defining and implementing the vision for observability across F5’s hybrid infrastructure and cloud environments. This role requires strong leadership, technical acumen, and the ability to operate under pressure while maintaining clear communication with stakeholders at all levels.

Key Responsibilities

Lead the global Incident Response (IR) program, optimizing processes across detection, triage, containment, remediation, and post-incident analysis.
Hire, mentor and train global team members on incident response best practices and observability tooling.
Serve as technical lead and head engineer for creation and management of monitoring tools and services to support F5 infrastructure and business systems.
Serve as the primary incident commander during major incidents, ensuring timely resolution, excellent communication, and stakeholder alignment.
Define and continuously refine incident response policies, procedures, and runbooks to ensure consistent and effective handling of incidents.
Drive improvements in detection, escalation, and resolution through automation, tooling, and process enhancements.
Define and report KPIs for service reliability, incident response, and observability maturity to senior leadership.
Conduct root cause analyses and lead post-incident reviews to identify lessons learned and prevent recurrence.
Design and lead cross-functional tabletop exercises to strengthen organizational preparedness, communication, and response coordination during major incidents.
Maintain detailed incident records and metrics to support auditing, compliance, and continuous improvement.
Collaborate with Service Now teams and architects to manage incidents.
Establish and maintain on‑call rotations with teams who own critical applications across the Digital organization.

Qualifications

5+ years of experience in running NOC/SOC/SRE teams with a focus on monitoring and observability.
10+ years managing incident response, IT service management, or a related field.
Proven track record of managing complex security incidents in cloud and hybrid environments.
Experience with SIEM, SOAR, and log analysis tools (e.g., Splunk, Data Dog, Panther, Crowdstrike).
Experience with observability tools, especially tooling focused on synthetics, metrics, and infrastructure telemetry (e.g. Grafana, Thousand Eyes, Logic Monitor, Pingdom, Zabbix).
Excellent communication skills with the ability to convey technical information to both technical and non‑technical audiences.
Ability to lead under pressure, prioritize effectively, and make decisions in high‑stakes situations.
Familiarity with AWS, Google Workspace, and common SaaS platforms.
Bachelor’s degree in Computer Science, Cybersecurity, Information Systems, or related field (or equivalent experience).

Preferred Qualifications

Experience working in infrastructure, IT, or security organizations.
Familiarity with tools such as…


Increase/decrease your Search Radius (miles)



Job Posting Language

Manager, Incident Ops and Observability