×
Register Here to Apply for Jobs or Post Jobs. X

Senior Vice President, Site Reliability Engineering; SRE

Job in Los Angeles, Los Angeles County, California, 90079, USA
Listing for: Oaktree Capital Management
Full Time position
Listed on 2026-01-24
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Project Manager
Job Description & How to Apply Below
Position: Senior Vice President, Site Reliability Engineering (SRE)

Overview

Our Company

Oaktree is a leader among global investment managers specializing in alternative investments, with over $200 billion in assets under management. The firm emphasizes an opportunistic, value-oriented and risk-controlled approach to investments in credit, private equity, real assets and listed equities. The firm has over 1400 employees and offices in 25 cities worldwide.

We are committed to cultivating an environment that is collaborative, curious, inclusive and honors diversity of thought. Providing training and career development opportunities and emphasizing strong support for our local communities through philanthropic initiatives are essential to our culture.

The Technology department at Oaktree Capital Management delivers secure, scalable, and innovative solutions that power the firm’s global investment and business operations. Through strong partnerships across the company, we drive digital transformation, advance operational efficiency, and provide a trusted data foundation to create measurable impact for Oaktree’s teams, clients, and partners.

For additional information please visit our website at

Role Summary

The Senior Vice President, Site Reliability Engineering (SRE) is a hands-on engineering leader responsible for defining, driving, and scaling reliability practices across Oaktree’s global technology ecosystem. This executive and engineer will work in close partnership with software engineering teams, architects, security experts, infrastructure and cloud engineers, as well as key business stakeholders to ensure applications, platforms, and architectures meet the highest standards of resilience, reliability, performance, and operational excellence.

The SVP will spearhead Oaktree’s enterprise-wide SRE strategy, including SLO/SLA frameworks, RTO/RPO definitions, error-budget practices, observability maturity, incident processes, and related automation initiatives. As Oaktree accelerates its migration to Azure, this leader will bring deep experience in cloud-native SRE practices.

This leader will drive innovation by leveraging Agentic AI to augment SRE functions. The SVP will own Oaktree’s observability platform, including technology selection, budgeting, vendor management, and governance.

Responsibilities
  • Define and execute the enterprise SRE vision.
  • Act as an enabling team. Foster SRE best practices in stream enabled teams.
  • Establish reliability frameworks including SLAs, SLOs, RTOs, RPOs, and error budgets.
  • Partner with engineering, architecture, security, and operations teams to effect changes in the spirit of appropriate reliability.
  • Lead reliability engineering for applications and infrastructure in Microsoft Azure.
  • Develop Agentic AI capabilities for SRE workflows.
  • Own enterprise observability strategies and platforms (preferable experience in Datadog and Cribl)
  • Build unified dashboards for system health and reliability insights.
  • Own the practices on major incident management, blameless postmortems, and problem management. Act as an enabling team and foster best practices.
  • Automate incident response processes. Foster AiOps.
  • Champion and roadmap chaos engineering and resilience testing.
  • Track and report on SLO adherence, DORA metrics, and reliability trends.
  • Manage budgets, vendor contracts, and platform procurement.
Required Qualifications
  • 10-15 years of SRE experience, with 5+ years in leadership.
  • Hands-on engineering expertise across cloud and hybrid systems.
  • Deep Microsoft Azure experience.
  • Strong knowledge of SLO/SLA frameworks and operational governance.
  • Proven ownership of incident and problem management.
  • Expertise with Observability/ APM and related tools (Preferably Datadog, Dynatrace, New Relic, Pager Duty, Cribl, Prometheus/Grafana, Azure Monitor, etc).
  • Background in Prompt Engineering, automation, IaC, and CI/CD.
  • Strong development background, background in infrastructure, and knowledge of architectural needs.
Preferred Qualifications
  • AZ-400 Certification.
  • SRE Foundation Certification.
  • Familiarity with Google SRE principles.
  • Experience with Agentic AI.
  • Experience in chaos engineering.
  • Knowledge of ITIL, Agile, Dev Ops best practices.
Education
  • Bachelor’s degree in Computer…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary