Site Reliability Engineer Job San Francisco area,California USA,IT/Tech

About Plenful

Plenful is on a mission to transform healthcare operations from the inside out. Built by healthcare operators for healthcare operators, Plenful is driven by a deep understanding of the challenges facing today’s care teams. We’re backed by notable investors and are building an AI workflow automation platform that healthcare teams rely on to operate smarter, faster, and more efficiently. We automate manual tasks across disparate systems to improve compliance posture, streamline manual work, and unlock critical revenue, so teams can deliver better patient care.

We serve 70+ leading health systems across the country and are excited to shape the future of healthcare.

About

The Role

We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow engine, serverless pipelines, containerized services and Postgres-based data layer. This role reports into engineering leadership and will influence how we build, scale and operate our platform as we continue to grow.

You’ll bring strong technical judgment, calm problem solving during incidents and a practical approach to improving reliability. You’ll collaborate closely with backend, ML and Dev Ops engineers and help shape a culture where operational excellence is clear, repeatable and shared across the team.

What You’ll Do Reliability, Observability and Performance

Maintain and evolve alerting so engineers receive clear, actionable signals for anomalies, latency regressions and reliability risks.
Define observability standards across metrics, logs and tracing with a focus on reliability, performance and customer impact instead of vanity data.
Investigate performance bottlenecks across our distributed systems including serverless task execution, containerized services, workflow orchestration and Postgres.
Lead incident response, coordinate root cause analysis and ensure reliability improvements are fully implemented and measured.

Infrastructure And Platform Operations

Improve the reliability of our distributed task processing, including autoscaling behavior, execution patterns, retry logic, rate limiting and failure isolation.
Support the stability of our serverless pipelines that process high volume workloads across multiple execution layers.
Partner with backend and ML teams on designing resilient mechanisms for scheduling, queueing and workflow execution.
Maintain efficient and predictable resource usage across compute, networking and storage.

Security, Compliance And Operational Excellence

Support security and compliance work including patching, audit readiness and vulnerability management.
Participate in the on-call rotation and respond to production incidents quickly and calmly with a focus on restoring stable service and clear communication.
Contribute to blameless postmortems, drive follow through on fixes and ensure learnings are documented for future engineers.

What We’re Looking For

5+ years of professional engineering experience in a B2B, SaaS company.
Strong experience operating production systems in cloud environments, ideally AWS.
Hands-on experience with serverless compute patterns, containerized services, distributed workflows and Postgres.
Solid understanding of observability tooling, performance debugging and system behavior under load.
A high ownership mindset, empathy for teammates, straightforward communication and a one team attitude.
Comfortable working in a fast paced startup environment with a bias for action and thoughtful engineering judgment.

Plenful perks

Comprehensive Benefits Package:
Enjoy unlimited PTO, fully covered health insurance (medical, dental, and vision), meal stipend, health & wellness stipend, 401(k) matching, and stock options.
Mission-Driven, World-Class Team:
Join an exceptional group of professionals aligned around a meaningful mission and committed to making an impact.
Opportunities for Growth:
Strengthen your partnership expertise through collaboration with experienced, high-performing leaders across the organization.
Flexible Work Environment:
Bay Area employees have two days per week in a downtown San Francisco office; other locations can work remotely with travel for collaboration.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language