×
Register Here to Apply for Jobs or Post Jobs. X

Principal Site Reliability Engineer

Job in Santa Fe, Santa Fe County, New Mexico, 87503, USA
Listing for: Ll Oefentherapie
Full Time position
Listed on 2026-01-19
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 86400 - 199500 USD Yearly USD 86400.00 199500.00 YEAR
Job Description & How to Apply Below
  • Does this position require a security clearance? Yes
  • Years 3 to 5+ years
  • Additional Info Visa / work permit sponsorship is not available for this position
  • Applicants are required to read, write, and speak the following languages English
Job Description

Our Team

Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Data,Analytics Platform.

This team will focus on product development and product strategy for Oracle Health, while building out a complete platform supporting modernized, automated healthcare. This is a net new line of business, constructed with an entrepreneurial spirit that promotes an energetic and creative environment. We are unencumbered and will need your contribution to make it a world class engineering center with the focus on excellence.

Oracle Health Data, Analytics Platform has a rare opportunity to play a critical role in how Oracle Health products impact and disrupt the healthcare industry by transforming how healthcare and technology intersect.

You will have the opportunity to:

  • Reach billions of people with our products & services
  • Create technology in which truly impacts the world
  • Ability to have immediate impact on developing technology
  • Unlimited growth potential with inspiring work
  • Work with the best minds in the industry
  • Enjoy working in an open, diverse, and productive environment

About The Job

This role provides technical leadership for the core data platforms behind Oracle Health’s Data & Analytics Platform. As a Principal Site Reliability Engineer (SRE), you will own shared, mission-critical systems used by multiple products and teams.

You will lead the design and operation of large-scale, stateful distributed platforms, including Hadoop ecosystem components (HDFS, YARN, HBase) deployed on Oracle Big Data Service (BDS), Kafka, and Storm. These multi-tenant platforms are deployed and operated through Ansible- and Terraform-based automation and require strong architectural ownership to manage scale, change, and broad blast radius.

What You'll Do Platform Ownership & Technical Leadership
  • Own the end-to-end reliability, scalability, and operability of shared data platforms
  • Define platform standards, architectural direction, and operational guardrails
  • Influence cross-team technical decisions and long-term platform strategy
  • Drive long-term platform evolution and influence reliability strategy across the data ecosystem
  • Lead platform architecture and design reviews
  • Clearly articulate system behavior, dependencies, and failure modes
  • Make principled trade-offs between reliability, performance, cost, and complexity
  • Provide guidance and guardrails that enable downstream teams to use platforms safely and effectively
Operations Engineering
  • Establish capacity models, scaling strategies, and operational best practices
  • Design platforms that behave predictably under load, failure, and change
  • Own platform lifecycle events: upgrades, expansions, decommissioning, and recovery
Distributed Systems Expertise
  • Operate and evolve stateful distributed systems where data placement, replication, and recovery are critical
  • Reason about failure modes such as back pressure, rebalancing, region movement, replication lag, and rolling upgrades
Security
  • Operate and maintain Kerberized platforms, including authentication, authorization, and secure service-to-service communication
  • Treat security as a first-class architectural concern
Automation
  • Design and evolve an Ansible- and Terraform-driven automation framework
  • Treat automation as production software: versioned, reviewed, tested, and improved
  • Eliminate operational toil by encoding reliability and safety into the platform
Incident Leadership & Prevention
  • Serve as the ultimate escalation point for complex or ambiguous incidents
  • Focus on eliminating entire classes of failure, not just resolving individual issues
Representation
  • Represent SRE and platform engineering in high-visibility and sensitive forums
  • Communicate clearly with engineering leadership and partner teams

Responsibilities

The team operates within the Oracle Health Data & Analytics Platform, supporting one of Oracle Health’s core products, Healthe Intent. We operate the big data and streaming…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary