Site Reliability Engineer
Listed on 2026-03-01
-
IT/Tech
Cloud Computing, Systems Engineer, IT Support, SRE/Site Reliability
About Plenful
Plenful is on a mission to transform healthcare operations from the inside out. Built by healthcare operators for healthcare operators, Plenful is driven by a deep understanding of the challenges facing today’s care teams. We’re backed by notable investors and are building an AI workflow automation platform that healthcare teams rely on to operate smarter, faster, and more efficiently. We automate manual tasks across disparate systems to improve compliance posture, streamline manual work, and unlock critical revenue, so teams can deliver better patient care.
We serve 70+ leading health systems across the country and are excited to shape the future of healthcare.
The Role
We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow engine, serverless pipelines, containerized services and Postgres-based data layer. This role reports into engineering leadership and will influence how we build, scale and operate our platform as we continue to grow.
You’ll bring strong technical judgment, calm problem solving during incidents and a practical approach to improving reliability. You’ll collaborate closely with backend, ML and Dev Ops engineers and help shape a culture where operational excellence is clear, repeatable and shared across the team.
What You’ll Do Reliability, Observability and Performance- Maintain and evolve alerting so engineers receive clear, actionable signals for anomalies, latency regressions and reliability risks.
- Define observability standards across metrics, logs and tracing with a focus on reliability, performance and customer impact instead of vanity data.
- Investigate performance bottlenecks across our distributed systems including serverless task execution, containerized services, workflow orchestration and Postgres.
- Lead incident response, coordinate root cause analysis and ensure reliability improvements are fully implemented and measured.
- Improve the reliability of our distributed task processing, including autoscaling behavior, execution patterns, retry logic, rate limiting and failure isolation.
- Support the stability of our serverless pipelines that process high volume workloads across multiple execution layers.
- Partner with backend and ML teams on designing resilient mechanisms for scheduling, queueing and workflow execution.
- Maintain efficient and predictable resource usage across compute, networking and storage.
- Support security and compliance work including patching, audit readiness and vulnerability management.
- Participate in the on-call rotation and respond to production incidents quickly and calmly with a focus on restoring stable service and clear communication.
- Contribute to blameless postmortems, drive follow through on fixes and ensure learnings are documented for future engineers.
- 5+ years of professional engineering experience in a B2B, SaaS company.
- Strong experience operating production systems in cloud environments, ideally AWS.
- Hands-on experience with serverless compute patterns, containerized services, distributed workflows and Postgres.
- Solid understanding of observability tooling, performance debugging and system behavior under load.
- A high ownership mindset, empathy for teammates, straightforward communication and a one team attitude.
- Comfortable working in a fast paced startup environment with a bias for action and thoughtful engineering judgment.
- Comprehensive Benefits Package:
Enjoy unlimited PTO, fully covered health insurance (medical, dental, and vision), meal stipend, health & wellness stipend, 401(k) matching, and stock options. - Mission-Driven, World-Class Team:
Join an exceptional group of professionals aligned around a meaningful mission and committed to making an impact. - Opportunities for Growth:
Strengthen your partnership expertise through collaboration with experienced, high-performing leaders across the organization. - Flexible Work Environment:
Bay Area employees have two days per week in a downtown San Francisco office; other locations can work remotely with travel for collaboration.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).