Senior Site Reliability Engineer Job San Francisco area,California USA,IT/Tech

Senior Site Reliability Engineer

Gradle Inc.

About the Role

Join Gradle Inc. as a Senior Site Reliability Engineer overseeing the reliability, performance, and availability of Develocity instances serving paying customers, open‑source projects, and public‑facing services, along with supporting infrastructure such as artifact registries.

Company Overview

Develocity is a first‑of‑its‑kind toolchain observability and acceleration platform that helps software teams improve DORA capabilities across Gradle, Maven, sbt, npm, and Python. It supports both CI and local builds, accelerating delivery and deepening observability.

Core Values

Seek to Understand
Know the Why
Innovate & Iterate
Own the Outcome

What You'll Do

Operate and maintain all Develocity instances and supporting services.
Participate in a follow‑the‑sun on‑call rotation, owning incident response and troubleshooting across the stack.
Drive automation across deployment, upgrades, monitoring, self‑healing, and recovery.
Build and maintain observability (logging, metrics, tracing, alerting) for all managed services.
Collaborate with engineering teams to embed reliability into features from the start.
Run incident response and retrospectives, learning from them.
Own disaster recovery, backups, and business continuity.
Communicate with customers during incidents and maintenance windows.
Optimize performance, resource usage, and cost.
Help evolve our SaaS operations as we scale.

Minimum Qualifications

5+ years in SRE, Dev Ops, or equivalent role operating production services at scale.
Strong Kubernetes experience in production environments.
Cloud infrastructure expertise, preferably AWS (EKS, RDS, S3, EC2).
Proficiency with observability tools (Prometheus, Grafana) and IaC (Terraform).
Track record of incident management and response.
Knowledge of SRE best practices (SLAs, SLOs).
Proficient scripting (Python, Bash) for automation.
Experience with 24/7 on‑call rotations.
Strong written and verbal English communication.

Preferred Qualifications

Experience operating SaaS platforms at scale.
Familiarity with Develocity.
JVM language experience (Java, Kotlin).
Disaster recovery planning and execution.
Customer‑facing incident communication skills.
Experience establishing SRE practices in new or growing teams.

What We Offer

Ground‑floor role in a new SRE team with real ownership of production systems.
Direct interaction with customers when issues arise.
A culture that values automation over heroics.
In‑person meetings such as annual company offsite and team gatherings.
Remote‑first environment with work‑from‑home flexibility.
Competitive salary and equity grants.

Compensation

US salary range: $150,000 – $190,000. Pay is determined by location, experience, skills, seniority, performance, and travel requirements.

Location

Remote from anywhere in the PST timezone.

Seniority Level

Mid‑Senior

Employment Type

Full‑time

Job Function

Architecture and Planning

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language