Sr. Platform Reliability Engineer

Job in San Francisco, San Francisco County, California, 94199, USA

Listing for: Oscar Technology

Full Time position
Listed on 2026-01-20

Job specializations:

IT/Tech
Systems Engineer, Cloud Computing, Cybersecurity, IT Support

Salary/Wage Range or Industry Benchmark: 180000 - 200000 USD Yearly USD 180000.00 200000.00 YEAR

A technology company in the advanced computing space is seeking a Sr. Platform Reliability Engineer to help build and support resilient, scalable infrastructure. This role focuses on Kubernetes, IaC, CI/CD, observability, and operational reliability across both cloud and on-prem environments. You will collaborate closely with platform and delivery teams while participating in a sustainable on-call rotation.

Unfortunately, this role does not offer sponsorship.

Details

• Full-Time, Permanent Position

• Salary: $180k – $200k

• San Francisco, CA | 5 Days On-Site

Key Responsibilities

• Design and maintain infrastructure across containers, VMs, and hybrid environments in major cloud platforms.

• Build and enforce Terraform-based IaC and consistent Git workflows.

• Own CI/CD pipelines and container build processes with secure, efficient delivery standards.

• Manage container registries, image hygiene, scanning, and promotion workflows.

• Implement Git Ops patterns for reliable, declarative environment management.

• Maintain observability systems (metrics, logs, dashboards, alert routing).

• Strengthen security across secrets, RBAC, network policies, and compliance checks.

• Oversee certificate lifecycle management and encrypted communication standards.

• Support disaster recovery plans, backup strategies, and resilience improvements.

Qualifications

• Bachelor’s Degree in Computer Science

• 5+ years in SRE/Dev Ops/infrastructure engineering.

• 5+ years of experience with Terraform

• 5+ years of experience with containerization and orchestration

• Strong Linux and networking fundamentals.

• 5+ years of experience with observability tools (Prometheus, Grafana, Loki, etc)

• 5+ years of experience with Python, Bash, or Go Lang

• Comfortable with on-call rotations, incident response, and automation

Good to Haves

• Git Ops, policy enforcement tools, secrets management, and certificate automation.

• Registry management and container security scanning.

• Distributed tracing and long-term metrics storage.

• Hybrid/on-prem operations or data-heavy platform support.

#JLjbffr


Increase/decrease your Search Radius (miles)



Job Posting Language