Sr Machine Learning Engineer
Listed on 2026-02-28
-
IT/Tech
Machine Learning/ ML Engineer, AI Engineer
Department
Description:
At Disney, we’re storytellers. We make the impossible, possible. The Walt Disney Company is a world-class entertainment and technological leader. Walt’s passion was to continuously envision new ways to move audiences around the world—a passion that remains our touchstone in an enterprise that stretches from theme parks, resorts and a cruise line to sports, news, movies and a variety of other businesses.
Uniting each endeavor is a commitment to creating and delivering unforgettable experiences — and we’re constantly looking for new ways to enhance these exciting experiences.
The Enterprise Technology mission is to deliver technology solutions that align to business strategies while enabling enterprise efficiency and promoting cross-company collaborative innovation. Our group drives competitive advantage by enhancing our consumer experiences, enabling business growth, and advancing operational excellence.
TeamDescription:
Reporting to the Director of Automation, Tooling, and Observability within Global Network Engineering & Operations (GNEO), the Machine Learning / Software Engineer plays a critical role in designing, developing, and implementing self‑healing infrastructure management systems for enterprise‑wide, production environments. This role combines deep expertise in machine learning, AI technology, software engineering, and Dev Ops to create reusable patterns, frameworks, and services to improve reliability across Services and Platforms.
The candidate will serve as a thought leader, identifying opportunities for and applying advanced analytics, predictive modeling, and AI to large‑scale telemetry, changes, events and incident data to derive actionable insights. The role focuses on building, deploying, and operating machine learning models that proactively detect issues, predict failures, and drive automated, self‑healing remediation across enterprise systems. The role is intentionally machine learning and AI heavy and is intended to be a strategic driver in that space.
Work alongside our first‑class applications, infrastructure & operations teams to understand current manual processes and business requirements
Architect, design, and implement reusable machine learning frameworks, patterns, and services that integrate into the enterprise automation and observability platforms
Design, train, and deploy machine learning models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and more in distributed environments that can be used to surface leading indicators of failure
Build near‑real‑time inference pipelines that generate actionable insights from live telemetry, including continuous streams of metrics, logs, traces, and operational events
Create data abstractions and perform feature engineering on high‑volume, high‑cardinality telemetry data
Evaluate model performance using real production signals and continuously iterate to improve accuracy and reliability
Build closed‑loop, event‑driven systems where model signals trigger automated remediation actions
Partner with infrastructure and SRE teams to identify opportunities and integrate machine learning and AI‑driven insights into operational tools, workflows, and dashboards
Analyze incident and historical data to uncover leading indicators and predictive signals
Own the full machine learning lifecycle: experimentation, validation, deployment, monitoring, and retraining
Breakdown targeted, manual processes into reusable software modules that leverage machine learning models
Build emulation and simulation environments (digital twins) of the infrastructure to test AI/ML‑driven automation under realistic scenarios and allow for faster ideation and iteration for architects and engineers.
Develop algorithms and frameworks to integrate machine learning and AI technologies into our orchestration platform
Ensure service reliability, performance, and operational uptime through code‑driven solutions.
Conduct root cause analysis, design fault‑tolerant architectures, and enable self‑healing automation.
Implement monitoring dashboards and KPIs to…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).