Backend Engineer; Monitoring
Listed on 2026-01-13
-
Software Development
Software Engineer, Backend Developer, AI Engineer, Cloud Engineer - Software
Location: Greater London
Final date to receive applications:
We accept submissions until 16 January 2026. We review applications on a rolling basis and encourage early submissions.
Join our new AGI safety monitoring team and help transform complex AI research into practical tools that reduce risks from AI. As a Backend Engineer, you'll work closely with our CEO, monitoring engineers and Evals team software engineers to build tools that make AI agent safety accessible are building tools that monitor AI coding agents for safety and security failures.
You will join a small team and will have significant ability to shape the team & tech, and have the ability to earn responsibility quickly. This opportunity is for those who care about building tools that genuinely make AI agents safe and thrive in high‑paced environments as well as enjoy closely working with researchers.
Key Responsibilities- Infrastructure & Architecture – Design and implement scalable backend systems capable of processing and analyzing large volumes of AI agent logs in real‑time; build and maintain data processing pipelines that extract, transform, and store agent trajectory data efficiently; architect database schemas and data models optimized for high‑throughput writes and complex analytical queries; design for reliability, implement robust error handling, retry logic, and graceful degradation;
monitor system performance and optimize bottlenecks to ensure sub‑second latency for critical monitoring operations. - API Development – Develop secure, well‑documented RESTful APIs that allow users to integrate our monitoring tools into their workflows; implement authentication, authorization, and rate limiting; build webhook systems and real‑time notification services to alert users about critical safety events; design API interfaces that are intuitive for developers while remaining flexible for diverse use cases; integrate with SIEM systems to stream monitoring alerts and security events into existing security operations workflows.
- Data Systems – Implement efficient storage solutions for structured and unstructured data; build processing systems for real‑time monitoring and batch analysis of historical data; design caching strategies to optimize frequent queries; create data retention and archival policies that balance user needs with storage efficiency.
- Monitoring & Observability – Build comprehensive logging, metrics, and tracing systems; implement alerting systems; create dashboards and tools to help the team understand system behavior; design systems that make debugging production issues straightforward and minimize time‑to‑resolution.
- Collaboration & Quality – Work closely with researchers to understand needs and translate prototypes into production‑ready systems; collaborate with frontend engineers for excellent user experiences; participate in code reviews to maintain high standards; document architectural decisions, API specifications, and system behaviors; contribute to technical discussions about technology choices, trade‑offs, and implementation approaches.
- 4+ years of experience building production backend systems at scale.
- Strong Python proficiency with experience in frameworks such as FastAPI, Flask, or Django.
- Experience designing and implementing RESTful APIs with clear documentation.
- Solid understanding of database design and optimization (SQL and/or No
SQL). - Experience with cloud platforms (AWS, Google Cloud, or Azure) and containerization technologies (Docker, Kubernetes).
- Experience building data‑intensive applications or processing large‑scale log data.
- Strong understanding of system design principles, including scalability, reliability, and security.
- Experience with asynchronous processing, message queues, and distributed systems.
- Demonstrated ability to write clean, well‑tested, maintainable code.
- Familiarity with real‑time data processing frameworks (Kafka, Redis Streams, etc.).
- Experience with ML/AI infrastructure or building tools for AI applications.
- Previous work on developer tools, monitoring systems, or security tools.
- Experience with infrastructure‑as‑code (Terraform, Cloud Formation, etc.).
- Familiarity with AI…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: