Senior Manager SRE Cloud Operations
Listed on 2026-03-02
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Oracle Cloud Infrastructure (OCI) is seeking an accomplished Senior Manager of Software Development with a strong background in both software engineering and cloud operations. In this role, you will lead a high-performing Software Reliability Engineers (SRE) and Dev Ops team responsible for designing, building, and operating highly available, scalable, and resilient cloud services operations automation and tools. You will be accountable not just for automation solutions, but also for the 12x7 operational health, performance, and efficiency of your services operation.
You will enable world-class customer experiences by setting operational standards, ensuring rapid detection and resolution of incidents, and continually driving for service operation excellence, automation, and efficiency. You will partner closely with Service (Product) and Support teams to deliver new solutions at scale, ensuring robust monitoring, alerting, and operational runbooks are in place.
Minimum Qualifications- Bachelor's or master's degree in computer science, Engineering, or relevant field, or equivalent experience.
- 3+ years’ technical or people management experience in cloud or SRE organizations.
- 10+ years’ experience in software engineering, site reliability engineering, or IT operations for large-scale, distributed, multi-tenant services.
- Demonstrated ownership of 24x7 operational services, including monitoring, incident response, and continuous improvement.
- Knowledge in at least one major language (Java, C, C++, Python) and in operational scripting.
- Solid grasp of distributed systems, networking, operating systems, and security fundamentals.
- Experience with automation, deployment pipelines, service telemetry, and operational dashboards.
- Strong communication and stakeholder management skills.
- 7+ years' operating and supporting cloud infrastructure or large SaaS environments.
- Deep hands-on experience in operational tools, runbook development, and incident management frameworks.
- Experience with cost management and operational efficiency at scale.
- Familiarity with container orchestration, configuration management, and infrastructure-as-code.
- Experience building and scaling geographically distributed teams, and managing complex on-call schedules.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).