×
Register Here to Apply for Jobs or Post Jobs. X

Senior Site Reliability Engineer

Job in Austin, Travis County, Texas, 78716, USA
Listing for: Branch
Full Time position
Listed on 2026-02-28
Job specializations:
  • IT/Tech
    Systems Engineer, SRE/Site Reliability, Cloud Computing
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

At Branch, we power every touchpoint with links that work and insights that prove it. From click to conversion, we make growth measurable. Our unparalleled attribution, backed by AI-enhanced linking, is trusted to deliver seamless experiences that increase ROI, decrease wasted spend, and eliminate siloed attribution.

We bring the same rigor to how we build our team, by empowering our people to move fast, own outcomes, and build something that matters. We take pride in making meaningful investments in our team’s health, wealth, and growth so individuals can thrive as we scale. Our culture values smart, humble, and collaborative teammates who take accountability and drive results in an environment where their work truly moves the business forward.

We are innovative, scaling with purpose, and led by seasoned leaders who know how to build enduring companies. Trusted by brands like Instacart, Western Union, NBCUniversal, Zoc Doc, and Sephora, we’re big enough to matter, small enough for you to make a real impact. If you’re excited by the grit of building, rapid learning, and shaping the future of customer growth, you’ll find your place here.

We are seeking a highly experienced Senior Site Reliability Engineer to own the reliability, performance, and operational excellence of our large-scale, distributed infrastructure. You will lead design and execution of systems that power mission critical services, shaping engineering practices, influencing architectural decisions, and driving automation and resiliency across the organization.

As a Senior Site Reliability Engineer, you’ll get to:
  • Architect, design, and evolve complex distributed systems to improve reliability, operational efficiency, and performance at scale.
  • Partner closely with product, security, and data engineering teams to translate business needs into resilient and scalable system designs.
  • Drive reliability through automation and advanced observability, ensuring proactive detection, reduced mean time to recovery, and consistent system hygiene.
  • Lead and mentor in high stakes situations, owning debugging efforts for critical issues and establishing durable prevention strategies.
  • Perform deep infrastructure cost audits, identifying areas of inefficiency and implementing solutions that reduce waste without compromising performance or security.
  • Own and maintain key distributed data platforms, including Aerospike and Foundation

    DB, ensuring durability, consistency, and performance.
  • Guide teams in defining SLIs/SLOs and operational best practices, elevating system reliability and engineering rigor across the org.
  • Continuously identify and eliminate bottlenecks, improving system throughput, latency, and overall efficiency.
  • Champion Infrastructure as Code (IaC) to automate provisioning, configuration, and lifecycle management using modern IaC tools and principles.
  • Lead our Git Ops and deployment strategy using Argo CD to implement secure, repeatable, and scalable delivery workflows across Kubernetes environments.
You’ll Be a Good Fit If You Have
  • 6+ years in SRE, systems engineering, or software engineering roles, ideally within fast-paced, rapidly scaling environments.
  • Proven track record as a senior reliability or production engineer, with ownership of large, distributed, customer-facing systems.
  • Expert level proficiency in Kubernetes, AWS, Linux internals, and distributed system fundamentals.
  • Strong programming skills in Go, Python, Java, Kotlin, Bash, or similar languages, with an emphasis on building reliable automation and tooling.
  • Hands‑on experience with modern observability stacks (Prometheus, Grafana, Alert Manager, Loki, Pager Duty).
  • Familiarity with large scale data and streaming ecosystems such as Kafka, Spark, Aerospike, Foundation

    DB, and the broader Hadoop ecosystem.
  • Deep experience with Terraform, Cloud Formation, or related IaC tooling, and the ability to guide teams in IaC best practices.
  • Proven incident management leadership in production SaaS systems, including on‑call excellence, post‑mortem execution, and long‑term reliability improvements.
  • Exceptional problem‑solving skills and the ability to lead complex investigations across multiple system…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary