Senior Research Scientist - Multimodal Agents
Listed on 2026-01-13
-
Science
Research Scientist
Senior Research Scientist - Multimodal Agents
Apply for the Senior Research Scientist - Multimodal Agents role at Canva
.
Join the team redefining how the world experiences design.
Where and How You Can WorkThe buzzing Canva London campus features several buildings around beautiful leafy Hoxton Square in Shoreditch. While our global headquarters is in Sydney, Australia, London is our HQ for Europe, with all kinds of teams based here, plus event spaces to gather our team and communities. You'll experience a warm welcome from our Vibe team at front of house, amazing home cooked food from our Head Chef and a variety of work spaces to hang out with your team mates or get solo work done.
That said, we trust our Canvanauts to choose the balance that empowers them and their team to achieve their goals and so you have choice in where and how you work.
At Canva, our mission is to empower the world to design. We’re building AI that feels magical and lands real impact for millions of people – helping anyone create with confidence. We’re looking for a senior research scientist who lives and breathes reinforcement learning and agentic systems to push the frontier of reasoning, tool use, and reliability – and ship it to users.
AboutThe Team
We explore multimodal agentic architectures, build scalable training and evaluation loops, and partner closely with product and platform teams to turn breakthroughs into delightful product features. We are a cutting‑edge post‑training team, developing new multimodal agentic systems. We work on all topics of multimodal modelling, post‑training and design agents, building scalable training and evaluation loops, and partnering closely with product and platform teams to turn breakthroughs into delightful product features.
We seek a person with experience in post‑training and reinforcement learning (RL).
The Role
You’ll drive research directions and play a leading role in hands‑on work across the agent stack—from reward design and policy optimization to planning, memory, and tool orchestration, dataset construction, to post‑training, and the development of novel post‑training approaches. You’ll design tight experiments, iterate quickly, and land trustworthy conclusions. Most importantly, you’ll help convert research into reliable, safe, and high‑quality product experiences.
WhatYou’ll Be Doing In This Role
- Develop agent systems (planning, multimodal tool use, retrieval, novel training approaches, modeling ablations) for real tasks in design, vision, and language.
- Scale post‑training and RL across distributed systems (PyTorch) with efficient data loaders, tracing/telemetry, stable training of mixture‑of‑experts (MoE) architectures, and reproducible pipelines; profile, debug, and optimize.
- Contribute to the research agenda for RL/agentic systems aligned with Canva’s product goals; identify high‑leverage bets and retire dead ends quickly.
- Build reward models and learning loops: RLHF/RLAIF, preference modeling, DPO/IPO‑style objectives, offline/online RL, curriculum learning, and credit assignment for multi‑step reasoning.
- Develop simulation and sandbox tasks that surface failure modes (planning errors, tool‑use brittleness, hallucination, unsafe actions) and turn them into measurable targets.
- Help align on rigorous evaluation for agents (task success, reliability, latency, safety, regressions). Stand up offline suites and online A/B tests; favor simple, controlled experiments that generalize.
- Collaborate and ship: work shoulder‑to‑shoulder with product, design, safety, and platform to land research as reliable features—then iterate.
- Share and elevate: mentor teammates, present findings internally, and contribute back to the community when it helps the field and our users.
- Depth in implementing and post‑training LLMs/VLMs/Diffusion models, with a track record of shipped research or publications in agents/RL.
- Experience modifying, and adapting open‑source models.
- Strong experience with experimental design: tight baselines, clean ablations, reproducibility, and clear, data‑backed conclusions.
- Fluency in Python and PyTorch; you’re comfortable…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: