Site Reliability Engineer Job San Francisco area,California USA,IT/Tech

Primer helps B2B products break out of the B2C-centric marketing box. Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market teams. We ingest billions of rows from first- and third-party sources, map them to rich company context, and surface hyper-targeted audiences and real-time performance alerts—all without vendor lock-in.

That only works if the lights stay on, queries stay fast, and incidents stay rare. That’s where you come in.

As our first dedicated Site Reliability Engineer, you’ll be the force multiplier who designs, builds, and operates the infrastructure that powers everything: petabyte-scale data pipelines, LLM-backed services, and the APIs our customers (and engineers!) rely on every day. You’ll pair hard-won ops experience with a mentor’s mindset—levelling up the whole team while keeping us four steps ahead of failure.

YOUR MISSION

Own reliability from design to customer.

• Define and uphold SLOs / SLIs, manage error budgets, and lead blameless post-mortems.

• Automate toil out of existence—CI/CD, infra-as-code, capacity planning, and chaos testing.

• Drive incident response end-to-end: detection, mitigation, root-cause analysis, and long-term fixes.

• Scale multi-cloud data pipelines (Prefect, Click House, Iceberg) and GPU/LLM workloads.

• Teach best practices, review designs, and coach engineers so reliability becomes a team sport.

WHAT YOU’LL DO

• Design, implement, and tune distributed systems that handle high-throughput B2B traffic.

• Harden our AWS stack with IaC (e.g. Terraform)

• Instrument everything—logs, traces, metrics, and AI-powered anomaly detection.

• Champion security, cost optimization, and disaster-recovery strategies.

• Jump into the weeds when something breaks, fix it fast, then automate it away.

WHAT YOU’LL BRING

Must-Haves

• 5+ years owning production systems at meaningful scale (sub-second latency, “four-nines” targets).

• Mastery of SRE fundamentals: SLO/SLI design, error budgets, incident playbooks.

• Deep hands-on with Linux, networking, containers/K8s, and at least one major cloud (AWS/GCP/Azure).

• Proven track record automating infra with Terraform, Helm, or similar IaC tooling.

• Fluency in at least one systems / scripting language (Go, Python, Rust, etc.).

• Experience operating complex data pipelines (Prefect, Airflow, Temporal) or real-time streaming systems.

• History of mentoring engineers and embedding reliability culture across teams.

• Pragmatic decision-maker—balances uptime, velocity, and cost for startup reality.

• Curiosity for AI-augmented ops (LLM chat-ops, anomaly detection, self-healing).

Nice-to-Haves

• Managed GPU clusters and ML inference workloads.

• Operated data lakes / lake houses at scale (Iceberg, Delta, etc.).

• Meaningful open-source contributions in SRE, Dev Ops, or data-infra projects.

WHY PRIMER

• Mission with impact – We’re unlocking new growth channels for thousands of B2B marketers.

• High-trust, low-ego culture – Fully distributed team, meeting-light weeks, Friday focus days.

• Work & life, balanced – Five weeks PTO, generous parental leave, and flexibility for families.

• Career rocket-fuel – Small team, huge problems, real ownership. Shape the future with bold innovators, driving impact that redefines industries.

• Diverse & global – Teammates span six countries—and counting.

• Intro Call with Engineering Manager – 30 min

• System Design – 60 min

• Operational Excellence Drill-down – 60 min

• Strategic Pragmatism Chat with CTO – 45 min

• Technical Coding/Systems Deep Dive – 30 min

• Culture & Values with CEO – 45 min

Decision typically withinhrs of final conversation.

READY TO LEVEL UP B2B MARKETING INFRASTRUCTURE?

Email with your résumé, Linked In, Git Hub, or anything that showcases your reliability superpowers. Let’s build the future—without the fire-drills.

#JLjbffr


Increase/decrease your Search Radius (miles)



Job Posting Language