Senior Site Reliability Engineer
Listed on 2025-12-24
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing
Senior Site Reliability Engineer – Zip Co
Join to apply for the Senior Site Reliability Engineer role at Zip Co
At Zip, we build cloud‑native software applications that serve millions of customers and process billions of dollars in payments. We’re looking for a seasoned leader with extensive senior leadership experience to spearhead our Site Reliability Engineering (SRE) initiatives and mentor our engineering team.
We offer a remote‑first opportunity for US‑based employees with the option to work in‑person out of our Manhattan office. You will encounter complex challenges that demand innovative solutions and strategic insight to maintain and improve system reliability at scale.
Key Responsibilities- Optimize system reliability, performance, and scalability across cloud environments (Azure, Kubernetes, Service Mesh).
- Define, measure, and improve Service Level Objectives (SLOs), manage error budgets, and automate toil to drive operational excellence in a blameless culture.
- Collaborate with engineering teams to design and deploy highly reliable and scalable integrated solutions for Fortune 100 clients.
- Develop automated upgrade systems for a constantly evolving Azure architecture.
- Maintain a complex event‑sourcing environment using CQRS principles.
- Develop self‑service tooling and automation (Terraform, Atlantis, ArgoCD) to empower developers within reliability standards and reduce toil.
- Monitor for service health and create automatic recoveries using metrics‑based canaries for reliable code deployment.
- 10+ years of experience in a Site Reliability Engineering, Production Engineering, or equivalent role.
- 5+ years of experience working with Kubernetes or a similar microservice architecture.
- 5+ years of experience in an Azure environment.
- Proven experience defining and implementing SLOs/SLIs and managing error budgets.
- Experience working in an agile environment and knowledge of agile practices.
- Jira experience with project management and story creation is a plus.
- Experience with CI/CD systems, preferably using Azure Dev Ops or Git Hub Actions.
- Strong understanding of networking and routing protocols, especially those involved in Service Mesh architectures.
- Experience incorporating AI tools (ChatGPT, Cursor, Codex, Git Hub Copilot) into day‑to‑day work.
- Must be able to work in an on‑call rotation with a focus on sustainable incident response and post‑mortem analysis (blameless culture).
- Flexible working culture
- Incentive programs
- 20 days PTO every year
- Generous paid parental leave
- Leading family support policies
- Company‑sponsored 401k match
- Learning and wellness subscription stipend
- Beautiful Union Square office with a casual dress code
- Industry‑leading, employer‑sponsored insurance for you and your dependents, with several 100% Zip‑covered choices
$150,000–170,000 based on industry benchmark and individual factors such as job‑related knowledge, skills, and experience.
Subject to the same considerations, the total compensation package may also include other elements, including a bonus and/or equity awards, in addition to a full range of medical, financial, and other benefits.
Equal Opportunity & FLSAWe pride ourselves on being a workplace that provides equal opportunities to people of all ages, cultural backgrounds, sexual orientations, gender identities, abilities, veteran status, and everything else that makes you unique. Equally, we’re committed to ensuring our recruitment processes are accessible and inclusive. If adjustments are needed to provide a fair and equitable experience, please let us know.
Other InformationZip participates in the federal government’s E‑Verify program.
Before You ApplyTry Zip: rebrand.ly/check-zip-out
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).