AI/Machine Learning Engineer
Listed on 2026-01-12
-
IT/Tech
Data Scientist, AI Engineer
About the Role
We are looking for an Evaluation Scientist who can work across both hands-on experimentation and automation infrastructure
. This role begins with running manual evaluations (e.g., executing and monitoring individual experiments) and progresses toward building scripts, tools, and infrastructure that streamline and automate these processes, with the long-term goal of reducing manual work as much as possible.
The ideal candidate will also bring expertise in coding agents and quality evaluation
, enabling them to design robust experiments and improve workflows. While the role will receive high-level guidance, candidates should be able to independently define and implement the lower-level details of experiment setup after ramping up. For example, given a high-level requirement for a new type of evaluation, the candidate should be able to propose and execute an implementation plan with detailed steps, metrics, and automation in place.
Key Responsibilities
- Run and manage manual evaluation experiments across AI/ML systems.
- Develop and maintain automation infrastructure (scripts, pipelines, tools) to reduce manual evaluation work.
- Design and execute new types of evaluations
, translating broad research questions into structured experiment setups. - Work with coding agents and applied ML workflows to define and measure quality.
- Define metrics, benchmarks, and evaluation criteria to assess performance and identify gaps.
- Collaborate with research leads to align evaluation design with project goals while owning implementation details
. - Ensure reproducibility, consistency, and scalability of evaluation processes.
Qualifications
- Strong coding skills in Python (or equivalent) for scripting, automation, and experiment design.
- Experience with running and analyzing experiments
, including quality evaluation methodologies. - Knowledge of coding agents, ML models, or applied automation frameworks
. - Ability to work independently
: take high-level requirements and define detailed steps for execution. - 2–4 years of hands-on experience in evaluation, scripting, or applied data science/ML (academic or industry).
- Strong analytical skills with experience in data handling, reporting, and experiment analysis
.
Preferred Skills
- Familiarity with evaluation frameworks and automation tools in AI/ML research.
- Experience in building scalable infrastructure for experiments or evaluations.
- Knowledge of experimental design, statistical testing, or quality benchmarking
.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).