Platform Product Engineer - SRE & Automation
Listed on 2026-02-28
-
IT/Tech
Cloud Computing, SRE/Site Reliability, Systems Engineer, IT Support
Attio is on a mission to redefine CRM for the AI era.
We’re building the first AI-native CRM — designed for the most ambitious go-to-market teams. We recently announced our $52M Series B, led by GV (Google Ventures), with support from Redpoint, Balderton, Point Nine, and 01A. Our team thrives on solving complex technical challenges, delighting our users, and setting a new standard for the industry.
About
The Role
We are seeking highly skilled and experienced Platform Product Engineers to join our Security, Infrastructure and Performance team. This is a crucial, dual-faceted role that combines high-level engineering strategy with hands-on operational excellence. The successful candidates will be responsible for building, operating, and continuously enhancing the internal technology platform, fundamentally treating this platform as a product with all development teams as its primary customers.
The Platform Product Engineer role is centered around embodying and executing Dev Ops principles, specifically focusing on:
- Automation:
Systematically removing manual toil from the software development lifecycle (SDLC) through the creation of robust tooling, CI/CD pipelines, and infrastructure-as-code (IaC). - Collaboration:
Fostering a tight, cooperative partnership with product development teams, gathering requirements, and delivering solutions that accelerate their productivity and time-to-market. - Continuous Improvement:
Instilling a culture of iterative enhancement for the platform's reliability, cost-efficiency, and developer experience.
What You'll Do
The core responsibility is to implement, maintain, and continuously improve the foundational platform infrastructure that powers all engineering services. This necessitates a relentless focus on ensuring high reliability, exceptional scalability, and optimal performance across the entire stack.
- Platform Infrastructure:
Build and maintain platform infrastructure using declarative IaC tools (e.g., Terraform, Pulumi), ensuring all environments are reproducible, version-controlled, and auditable. Proactively manage the capacity of the infrastructure to consistently meet or exceed Service Level Objectives for latency, error rates, and availability. - Incident Response and Post-Mortems:
Act as first-line responders for critical system incidents. Triage, diagnose, and resolve complex production issues rapidly. Drive a culture of blameless post-mortems, ensuring root causes are identified, and long-term preventative measures are implemented as code (e.g., via runbooks, automation, or system design changes). - Tooling & Automation:
Own the stack of supporting tools necessary for operational excellence and developer enablement, including: - Continuous Integration and Continuous Delivery (CI/CD) Pipelines:
Implement, maintain, and evolve the fully automated CI and CD pipelines. This includes establishing best practices for fast, reliable, and secure build, test, and deployment processes. - Observability:
Implement and manage robust systems for monitoring (metrics), logging (centralised log aggregation), and distributed tracing to provide deep insights into application and infrastructure health.
- Applied Dev Ops and SRE Principles:
- Must have :
Demonstrable, hands-on experience applying core Dev Ops and Site Reliability Engineering (SRE) principles to manage, monitor, and scale production systems. - Must have: A deep understanding of the SRE mindset, including SLO/SLA creation and monitoring, error budget management, toil reduction, and post-incident review (blameless postmortems).
- Desirable:
Proven ability to drive cultural and process change that fosters a collaborative approach between development and operations teams. - Cloud Infrastructure and Containerisation Expertise:
- Must have:
Expertise in one or more major public cloud providers (AWS,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).