More jobs:
Site Reliability Engineer
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-02-27
Listing for:
Primer
Full Time
position Listed on 2026-02-27
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
That only works if the lights stay on, queries stay fast, and incidents stay rare. That’s where you come in.
As our first dedicated Site Reliability Engineer, you’ll be the force multiplier who designs, builds, and operates the infrastructure that powers everything: petabyte-scale data pipelines, LLM-backed services, and the APIs our customers (and engineers!) rely on every day. You’ll pair hard-won ops experience with a mentor’s mindset—levelling up the whole team while keeping us four steps ahead of failure.
YOUR MISSION
Own reliability from design to customer.
• Define and uphold SLOs / SLIs, manage error budgets, and lead blameless post-mortems.
• Automate toil out of existence—CI/CD, infra-as-code, capacity planning, and chaos testing.
• Drive incident response end-to-end: detection, mitigation, root-cause analysis, and long-term fixes.
• Scale multi-cloud data pipelines (Prefect, Click House, Iceberg) and GPU/LLM workloads.
• Teach best practices, review designs, and coach engineers so reliability becomes a team sport.
WHAT YOU’LL DO
• Design, implement, and tune distributed systems that handle high-throughput B2B traffic.
• Harden our AWS stack with IaC (e.g. Terraform)
• Instrument everything—logs, traces, metrics, and AI-powered anomaly detection.
• Champion security, cost optimization, and disaster-recovery strategies.
• Jump into the weeds when something breaks, fix it fast, then automate it away.
WHAT YOU’LL BRING
Must-Haves
• 5+ years owning production systems at meaningful scale (sub-second latency, “four-nines” targets).
• Mastery of SRE fundamentals: SLO/SLI design, error budgets, incident playbooks.
• Deep hands-on with Linux, networking, containers/K8s, and at least one major cloud (AWS/GCP/Azure).
• Proven track record automating infra with Terraform, Helm, or similar IaC tooling.
• Fluency in at least one systems / scripting language (Go, Python, Rust, etc.).
• Experience operating complex data pipelines (Prefect, Airflow, Temporal) or real-time streaming systems.
• History of mentoring engineers and embedding reliability culture across teams.
• Pragmatic decision-maker—balances uptime, velocity, and cost for startup reality.
• Curiosity for AI-augmented ops (LLM chat-ops, anomaly detection, self-healing).
Nice-to-Haves
• Managed GPU clusters and ML inference workloads.
• Operated data lakes / lake houses at scale (Iceberg, Delta, etc.).
• Meaningful open-source contributions in SRE, Dev Ops, or data-infra projects.
WHY PRIMER
• Mission with impact – We’re unlocking new growth channels for thousands of B2B marketers.
• High-trust, low-ego culture – Fully distributed team, meeting-light weeks, Friday focus days.
• Work & life, balanced – Five weeks PTO, generous parental leave, and flexibility for families.
• Career rocket-fuel – Small team, huge problems, real ownership. Shape the future with bold innovators, driving impact that redefines industries.
• Diverse & global – Teammates span six countries—and counting.
• Intro Call with Engineering Manager – 30 min
• System Design – 60 min
• Operational Excellence Drill-down – 60 min
• Strategic Pragmatism Chat with CTO – 45 min
• Technical Coding/Systems Deep Dive – 30 min
• Culture & Values with CEO – 45 min
Decision typically withinhrs of final conversation.
READY TO LEVEL UP B2B MARKETING INFRASTRUCTURE?
Email with your résumé, Linked In, Git Hub, or anything that showcases your reliability superpowers. Let’s build the future—without the fire-drills.
#JLjbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×