×
Register Here to Apply for Jobs or Post Jobs. X

Applied Researcher; Product

Job in Greater London, London, Greater London, W1B, England, UK
Listing for: Apollo Research
Full Time position
Listed on 2026-02-25
Job specializations:
  • Software Development
    AI Engineer, Data Scientist
Job Description & How to Apply Below
Position: Applied Researcher (Product)
Location: Greater London

Join to apply for the Applied Researcher (Monitoring) role at Apollo Research
.

Final date to receive applications:
We accept submissions until 16 January 2026. We review applications on a rolling basis and encourage early submissions.

THE OPPORTUNITY

Join our new AGI safety monitoring team and help transform complex AI research into practical tools that reduce risks from AI. As an applied researcher, you’ll work closely with our CEO, monitoring engineers and Evals team software engineers to build tools that make AI agent safety accessible  are building tools that monitor AI coding agents for safety and security failures. You will join a small team and will have significant ability to shape the team & tech, and have the ability to earn responsibility quickly.

You will like this opportunity if you're passionate about using empirical research to make AI systems safer in practice. You enjoy the challenge of translating theoretical AI risks into concrete detection mechanisms. You thrive on rapid iteration and learning from data. You want your research to directly impact real-world AI safety.

Key Responsibilities
  • Systematically collect and catalog coding agent failure modes from real-world instances, public examples, research literature, and theoretical predictions.
  • Design and conduct experiments to test monitor effectiveness across different failure modes and agent behaviors.
  • Build and maintain evaluation frameworks to measure progress on monitoring capabilities.
  • Iterate on monitoring approaches based on empirical results, balancing detection accuracy with computational efficiency.
  • Stay current with research on AI safety, agent failures, and detection methodologies.
  • Stay current with research into coding security and safety vulnerabilities.
Monitor Design & Optimization
  • Develop a comprehensive library of monitoring prompts tailored to specific failure modes (e.g., security vulnerabilities, goal misalignment, deceptive behaviors).
  • Experiment with different reasoning strategies and output formats to improve monitor reliability.
  • Design and test hierarchical monitoring architectures and ensemble approaches.
  • Optimize log pre-processing pipelines to extract relevant signals while minimizing latency and computational costs.
  • Implement and evaluate different scaffolding approaches for monitors, including chain-of-thought reasoning, structured outputs, and multi-step verification.
Future Projects (likely not in the first 6 months)
  • Fine‑tune smaller open‑source models to create efficient, specialized monitors for high‑volume production environments.
  • Design and build agentic monitoring systems that autonomously investigate logs to identify both known and novel failure modes.
Job Requirements
  • 2+ years of experience conducting empirical research with large language models or AI systems.
  • Strong experience with AI coding agents, having extensively used and compared frontier coding agents.
  • Experience with LLM‑as‑a‑judge setups.
  • Experience designing and running experiments, analyzing results, and iterating based on empirical findings (e.g., prompting, scaffolding, agent design, fine‑tuning, or RL).
  • Strong Python programming skills.
  • Demonstrated ability to work independently on open‑ended research problems.
Bonus
  • Experience with AI evaluation frameworks, in particular Inspect (though other frameworks are relevant as well).
  • Familiarity with AI safety concepts, particularly agent‑related risks.
  • Familiarity with computer security (e.g., security testing and secure system design).
  • Experience fine‑tuning language models or working with smaller open‑source models.
  • Previous work building developer tools or monitoring systems.
  • Publications or contributions to AI safety or ML research.
  • Experience with production log systems or production log analysis.
What You’ll Accomplish in Your First Year
  • Build a comprehensive failure mode database:
    Systematically collect and categorize 100+ distinct AI agent failure modes across safety and security dimensions, creating the foundation for our monitoring library.
  • Develop and validate monitoring approaches:
    Create and empirically test monitoring prompts and strategies for key failure categories, establishing clear…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary