×
Register Here to Apply for Jobs or Post Jobs. X

Member of Technical Staff; SRE

Job in New York, New York County, New York, 10261, USA
Listing for: Cockroach Labs
Full Time position
Listed on 2026-01-16
Job specializations:
  • IT/Tech
    SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below
Position: Member of Technical Staff (SRE)
Location: New York

The Role

Cockroach

DB provides the backbone of storing data on a global scale. Our core mission on the SRE team is to operate at scale a secure & reliable Cockroach Cloud product. We provide consultation, planning, architectural oversight, concrete designs, development, and implementation that improve the resilience, efficiency, performance, and availability of our Cloud Service. We also take pride in being good on‑call engineers.

Regular reflection on the on‑call experience can contribute to short, medium, & long‑term improvements to the core product, including to CRDB itself.

As a Site Reliability Engineer you’ll help manage and scale our Cockroach Cloud service, a fully managed global offering of Cockroach

DB spanning multiple cloud providers. You will oversee our production system, ensuring that we can provide stable and scalable infrastructure as we deliver Cockroach

DB to our customers.

You Will
  • Manage the infrastructure for cloud services, including running internal production systems and hosting Cockroach

    DB for our external customers.
  • Design, write, and deliver software and systems to increase product reliability and operational efficiency.
  • Develop custom tools as necessary.
  • Keep a complex system running and solve problems relating to mission‑critical services.
  • Design, implement, operate, and troubleshoot the automation and monitoring of production clusters to maximize performance and availability.
  • Drive the company through disaster recovery tests, where we manually turn down pieces of Cockroach

    DB to test its overall resilience to failures.
  • Participate in an on‑call rotation for our production systems and hosted services.
The Expectations

In your first 30 days, you will onboard and be exposed to our current internal and customer‑facing production systems. Working with our existing SRE and engineering teams, you will pair on production operations and build out runbooks for the operation of different systems. We believe that it's essential for you to take this first month to become familiar with our technology and our company.

After 3 months, you'll be fully integrated into the team. You will develop and own tooling for reliability, automation, and other issues related to Cockroach Cloud’s stability and scalability. You will identify new opportunities for automating processes, streamlining delivery, deploying new core functionality, and building great tools. You will help make Cockroach Cloud the best platform to host Cockroach

DB by bringing your expertise to our database.

You Have
  • Expertise in analyzing, monitoring, and troubleshooting large‑scale distributed systems.
  • Experience in software development using one or more of the following:
    Go, C, C++, Python, Java.
  • Proficiency working with algorithms, data structures, and production troubleshooting.
  • Expertise in working with major cloud providers (AWS, Azure, GCP, etc.) and Cloud APIs.
  • Debugged and optimized code and to automate routine tasks.
  • Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc.)
  • Prior on‑call experience, exhibiting sense of ownership, attention to detail, and urgency.
  • Experience building collaborative relationships with your colleagues. You enjoy being part of the code review process and partnering with your teammates on challenging problems.
The Team

We are a group of software engineers first & foremost. We use software engineering as a means to achieve our mission; this is the SRE way. The SRE team is currently distributed across North America (5) and India (4).

Reporting to Tom Schmidt – Director, Production Engineering

Tom recently joined Cockroach Labs as manager of Site Reliability Engineering and has taken responsibility for Cockroach Cloud’s production operations. Tom joined Cockroach Labs after 15 years at IBM where he initially contributed in a wide variety of technical leadership roles, generally focussing on quality and automation across compiler development, test frameworks, CICD, and more. Over the past 7 years, Tom has become an enthusiastic advocate of the Site Reliability Engineering discipline, presenting on the topic at conferences, developing certification curriculum, and securing multiple…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary