IT Operations Senior Manager
Listed on 2026-03-12
-
IT/Tech
IT Support, SRE/Site Reliability
THE POSITION
Our roster has an opening with your name on it
Fan Duel is looking for a dynamic Senior Manager, IT Operations (AIOps & Incident Automation) to lead a globally distributed 24/7 IT Operations function and a human-on-the-loop team focused on automating end-to-end incident management across our products (Sports book, Casino, Fantasy, Retail, Racing, and more).
This role combines hands‑on technical leadership with people management to reduce operational toil and improve reliability through AIOps, workflow orchestration, runbook automation, and data-driven prevention. You will ensure automation is safe and auditable, with the right human oversight for high-impact decisions.
Reporting to the Sr. Director, Tech Ops, you will partner closely with Engineering, SRE, and Service Management to shift operational ownership left, improve production readiness, and drive preventative actions that reduce incident frequency and customer impact.
In addition to the specific responsibilities outlined above, employees may be required to perform other such duties as assigned by the Company. This ensures operational flexibility and allows the Company to meet evolving business needs.
THE GAME PLANEveryone on our team has a part to play
- Lead and develop a team of Technical Operations Engineers setting clear expectations for 24/7 coverage, quality, and customer impact.
- Own the AIOps and incident automation roadmap, including event correlation, alert noise reduction, auto‑triage, automated communications, and runbook execution.
- Drive preventative actions through trend analysis, problem management, recurring incident elimination, and strong follow‑through on post‑incident action items.
- Implement and continuously improve ITIL‑aligned incident, problem, and change practices with a focus on speed, clarity, and learning.
- Act as an escalation point for major incidents (P1/P2) and coordinate real‑time response, stakeholder communications, and executive updates.
- Partner with Engineering and SRE to shift left: strengthen production readiness, on‑call hygiene, runbooks, alert quality, and self‑service remediation patterns.
- Define and improve observability and operations analytics (metrics/logs/traces), ensuring actionable alerting and clear service health signals.
- Track and report on key operational metrics (MTTD/MTTR, uptime, alert volume, automation coverage, incident recurrence, toil reduction, SLA/SLO performance).
- Establish guardrails for AI and automation (human approval workflows, auditability, rollback plans, and change control) appropriate for a regulated environment.
- Manage third‑party providers and tooling integrations, enforcing SLAs and continuously improving reliability of the end‑to‑end operational toolchain.
What we're looking for in our next teammate
Required Qualifications- Bachelor’s or master’s degree in Computer Science, Engineering, or equivalent practical experience is preferred.
- 7+ years of experience in production operations (IT Ops, SRE, NOC, or similar), including 5+ years leading people and/or managers in a 24/7 environment is preferred.
- Experience improving reliability through automation and operational excellence, including incident lifecycle improvements and post‑incident prevention.
- Hands‑on experience designing automation and workflows using scripting or programming (e.g., Python), APIs, and orchestration tools.
- Strong understanding of observability (monitoring, logging, tracing), alerting strategy, and incident response best practices.
- Experience partnering with Engineering/SRE to drive shift‑left initiatives and influence service ownership, production readiness, and on‑call standards.
- Comfortable with AIOps concepts (event correlation, anomaly detection, noise reduction) and human‑on‑the‑loop oversight for automated decisioning.
- Excellent communication skills, including the ability to translate complex technical issues to non‑technical stakeholders and senior leaders.
- Strong judgment under pressure with a bias for action, accountability, and continuous learning.
- Strong understanding of cloud services and modern infrastructure (e.g., AWS, Google Cloud, Azure), including containerized and distributed…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).