Site Reliability Developer ; SRD— Observability ( Health | US Federal
Listed on 2026-03-04
-
IT/Tech
Cloud Computing, Systems Engineer, Cybersecurity, IT Support
Job Description Key Responsibilities
Design, develop, and operate cloud-scale observability and infrastructure monitoring platforms spanning network, compute, storage, and virtualization layers.
Build and maintain monitoring and logging solutions leveraging technologies such as OpenNMS
, Prometheus
, Open Search/Elastic
, Logstash
, Kafka
, Grafana
, and related components.Develop and maintain automation frameworks and operational tooling using Java
, Python
, Ansible
, Chef
, Terraform
, and Unix/Linux scripting.Design and implement CI/CD pipelines and delivery workflows using Jenkins
, Git Hub
, OCI CI/CD services
, Terraform
, and container-based build/release patterns.Architect and operate containerized workloads and platforms using Docker and Kubernetes
.Partner with SRE, Dev Ops, Security, and Infrastructure teams to improve reliability, scalability, performance, and operability
.Troubleshoot complex issues across distributed systems and infrastructure layers; lead incident response and drive corrective actions.
Implement proactive monitoring, alerting, and self-healing patterns to reduce incident frequency and improve MTTR.
Contribute to design reviews, code reviews, operational readiness, and documentation; help define standards and best practices.
Own problems end-to-end with a high level of autonomy, execution discipline, and accountability.
Education and/or
Experience:
8 years of experience in application of server architecture, system administration, software development, or cloud application delivery
ORBachelor’s Degree in Computer Science, Information Systems, Math, or related field
AND 4 years of experience in application of server architecture, system administration, software development, cloud application delivery, operations, or related field
.
Strong experience with Unix/Linux systems, infrastructure monitoring, and virtualization (e.g.,
VMware, KVM
).Hands-on expertise in a majority of the following:
Monitoring/observability:
OpenNMS (or equivalent),
Prometheus
, Open Search/Elastic
, Logstash
, Kafka
, GrafanaContainers/orchestration:
Docker
, KubernetesCI/CD and source control:
Jenkins
, Git Hub
, OCI CI/CD services
, Terraform
, automated delivery pipelinesAutomation/IaC:
Ansible
, Chef
, Terraform
, Python (plus shell scripting)Programming:
JavaData stores used in observability platforms (indexing/search/time-series/log pipelines)
Strong understanding of secure engineering and operational best practices (least privilege, secrets handling, hardening, patching).
Proven ability to operate in a production-critical, always-on environment
, including participating in on-call rotations.Demonstrated system design and problem-solving skills with the ability to simplify complex operational challenges.
Experience with cloud platforms such as OCI, AWS, Azure
, or similar.Strong networking knowledge and experience with network device monitoring and/or network automation.
Experience operating large-scale, distributed, highly available systems across multiple regions/environments.
Exposure to performance tuning, capacity planning, and infrastructure security practices in regulated environments.
Familiarity with applying AI/ML techniques to observability (anomaly detection, event correlation, predictive analytics).
U.S. Federal government security clearance is required (active or ability to obtain/maintain—per role requirements).
Must be a U.S. Person and meet all eligibility requirements for access to U.S. Government controlled information and systems, as applicable.
Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
Range and benefit information provided in this posting are specific to the stated locations only
US:
Hiring Range in USD from: $79,100 to $158,200 per annum. May be eligible for bonus and equity.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle’s differing products,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).