Site Reliability Engineer II
Listed on 2026-03-02
-
IT/Tech
Systems Engineer, Cloud Computing
Job Summary Department/Group Overview
Our engineering fleet is a horizontal set of teams providing engineering services across the organization. Our specific team provides reliability engineering and operational support to backend service development teams.
Disney Entertainment and ESPN Product & TechnologyTechnology is at the heart of Disney's past, present, and future. Disney Entertainment and ESPN Product & Technology is a global organization of engineers, product developers, designers, technologists, data scientists, and more - all working to build and advance the technological backbone for Disney's media business globally.
The team marries technology with creativity to build world‑class products, enhance storytelling, and drive velocity, innovation, and scalability for our businesses. We are Storytellers and Innovators. Creators and Builders. Entertainers and Engineers. We work with every part of The Walt Disney Company's media portfolio to advance the technological foundation and consumer media touch points serving millions of people around the world.
Hereare a few reasons why we think you'd love working here:
- Building the future of Disney's media: Our Technologists are designing and building the products and platforms that will power our media, advertising, and distribution businesses for years to come.
- Reach, Scale & Impact: More than ever, Disney's technology and products serve as a signature doorway for fans' connections with the company's brands and stories. Disney+. Hulu. ESPN. ABC. ABC News...and many more. These products and brands - and the unmatched stories, storytellers, and events they carry - matter to millions of people globally.
- Innovation: We develop and implement groundbreaking products and techniques that shape industry norms and solve complex and distinctive technical problems.
The Streaming SRE squad drives improvements in performance, resiliency, and operational excellence. We take a consultative approach to reliability engineering–partnering with a variety of cross‑functional teams to provide guidance, automation, education, and best practices that elevate the reliability and scalability of services that support our products and brands.
We are seeking a Site Reliability Engineer who will contribute to the stability and scalability of critical systems by building automation, improving operational workflows, enhancing observability, and participating in incident response. The ideal candidate has a strong understanding of distributed system fundamentals, cloud‑native resources and operations, and performance optimization. This role requires a collaborative mindset and the ability to work closely with engineering teams to implement SRE principles across the organization.
Fostering innovation is a critical component to success here at Disney Entertainment and ESPN Product & Technology. Therefore, the ideal candidate will also need to be highly adaptable to changes and be able to pivot when required.
Responsibilities- Contribute to the design, implementation, and improvement of systems to enhance reliability, scalability, and performance.
- Build and maintain automation for deployment, monitoring, alerting, and operational workflows.
- Collaborate with software engineering teams to implement SRE best practices, including SLIs, SLOs, error budgets, and automated remediation.
- Support CI/CD pipelines and participate in optimizing the software delivery lifecycle.
- Develop tools, dashboards, and instrumentation to improve observability across metrics, logs, and distributed tracing.
- Participate in incident response, root cause analysis (RCA), and corrective actions to prevent recurrence.
- Assist in capacity planning, performance tuning, and scaling strategies for distributed systems.
- Maintain and improve Infrastructure-as-Code (IaC) definitions and cloud environment configurations.
- Contribute to documentation, runbooks, architectural diagrams, and operational standards.
- Collaborate with cross‑functional teams to identify reliability risks and recommend improvements.
- Participate in incident‑based escalations and rotations to support high‑availability production systems.
- Continuously…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).