Site Reliability Engineer
Listed on 2026-03-01
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing, IT Support
Location
San Francisco, New York, Austin
Employment TypeFull time
Location TypeRemote
DepartmentProduct Engineering
About Eng FlowAt Eng Flow
, we help developers save time by accelerating software builds and tests. Our cloud-based, distributed service optimizes developer workflows through remote execution and caching, improving efficiency, productivity, and product quality.
Backed by top investors, Eng Flow is redefining how companies build software and ship well-tested products. Our solutions speed up builds by a factor of 10 or more, while our observability platform provides actionable insights for optimization. Founded by key contributors to Bazel, we build tools that empower engineering teams—from startups to Fortune 500 companies—to enhance developer velocity and improve build performance.
Learn more about our mission, culture, and team:
Eng Flow | Video
We’re looking for an experienced SRE to join our engineering team. You’ll be at the intersection of software engineering and systems operations — ensuring our distributed infrastructure is highly available, performant, and scalable while enabling our engineers to move quickly and confidently.
Key ResponsibilitiesDesign, build, and maintain cloud infrastructure for our distributed build acceleration platform
Automate everything
: from deployment pipelines to monitoring and recoveryManage scalability and reliability for high-throughput, low-latency systems
Implement and maintain observability
: logging, metrics, tracing, and alertingWork closely with product and engineering teams to embed reliability into every feature
Diagnose and resolve production incidents quickly, and feed learnings back into systems design
Optimize cost, performance, and resilience across multi-cloud environments
4+ years in SRE, Dev Ops, or Production Engineering roles
Experience managing Kubernetes in production
Strong background in cloud infrastructure (GCP or AWS) and IaC (Terraform preferred)
Solid knowledge of networking, security, and distributed systems
Track record of improving system availability and developer productivity
A knack for debugging complex, cross-system issues under pressure
We offer comprehensive medical, dental, vision benefits, 401k/pension, parental leave and generous vacation. The team is fully remote but we enjoy meeting together several times a year at exciting destinations throughout the world. We value getting the work done and having fun while doing it, and have done numerous fun team events such as chocolate, whisky, and tea tastings, monthly team games, escape the room, and other fun events.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).