Senior Site Reliability Engineer
Listed on 2026-03-01
-
IT/Tech
Cloud Computing, SRE/Site Reliability, IT Support, Systems Engineer
Senior Site Reliability Engineer
Gradle Inc.
About the RoleJoin Gradle Inc. as a Senior Site Reliability Engineer overseeing the reliability, performance, and availability of Develocity instances serving paying customers, open‑source projects, and public‑facing services, along with supporting infrastructure such as artifact registries.
Company OverviewDevelocity is a first‑of‑its‑kind toolchain observability and acceleration platform that helps software teams improve DORA capabilities across Gradle, Maven, sbt, npm, and Python. It supports both CI and local builds, accelerating delivery and deepening observability.
Core Values- Seek to Understand
- Know the Why
- Innovate & Iterate
- Own the Outcome
- Operate and maintain all Develocity instances and supporting services.
- Participate in a follow‑the‑sun on‑call rotation, owning incident response and troubleshooting across the stack.
- Drive automation across deployment, upgrades, monitoring, self‑healing, and recovery.
- Build and maintain observability (logging, metrics, tracing, alerting) for all managed services.
- Collaborate with engineering teams to embed reliability into features from the start.
- Run incident response and retrospectives, learning from them.
- Own disaster recovery, backups, and business continuity.
- Communicate with customers during incidents and maintenance windows.
- Optimize performance, resource usage, and cost.
- Help evolve our SaaS operations as we scale.
- 5+ years in SRE, Dev Ops, or equivalent role operating production services at scale.
- Strong Kubernetes experience in production environments.
- Cloud infrastructure expertise, preferably AWS (EKS, RDS, S3, EC2).
- Proficiency with observability tools (Prometheus, Grafana) and IaC (Terraform).
- Track record of incident management and response.
- Knowledge of SRE best practices (SLAs, SLOs).
- Proficient scripting (Python, Bash) for automation.
- Experience with 24/7 on‑call rotations.
- Strong written and verbal English communication.
- Experience operating SaaS platforms at scale.
- Familiarity with Develocity.
- JVM language experience (Java, Kotlin).
- Disaster recovery planning and execution.
- Customer‑facing incident communication skills.
- Experience establishing SRE practices in new or growing teams.
- Ground‑floor role in a new SRE team with real ownership of production systems.
- Direct interaction with customers when issues arise.
- A culture that values automation over heroics.
- In‑person meetings such as annual company offsite and team gatherings.
- Remote‑first environment with work‑from‑home flexibility.
- Competitive salary and equity grants.
US salary range: $150,000 – $190,000. Pay is determined by location, experience, skills, seniority, performance, and travel requirements.
LocationRemote from anywhere in the PST timezone.
Seniority LevelMid‑Senior
Employment TypeFull‑time
Job FunctionArchitecture and Planning
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).