Senior Research Engineer - Interactive Avatars Job London area,Greater London England UK,Software Development

Location: Greater London

London

Welcome to the video first world

From your everyday PowerPoint presentations to Hollywood movies, AI will transform the way we create and consume content.

Today, people want to watch and listen, not read — both at home and you’re reading this and nodding, check out our brand video
.

Despite the clear preference for video, communication and knowledge sharing in the business environment are still dominated by text, largely because high-quality video production remains complex and challenging to scale—until now….

Meet Synthesia

We're on a mission to make video easy for everyone. Born in an AI lab, our AI video communications platform simplifies the entire video production process, making it easy for everyone, regardless of skill level, to create, collaborate, and share high-quality videos. Whether it's for delivering essential training to employees and customers or marketing products and services, Synthesia enables large organizations to communicate and share knowledge through video quickly and efficiently.

We’re trusted by leading brands such as Heineken, Zoom, Xerox, McDonald’s and more. Read stories from happy customers and what 1,200+ people say on G2
.

In February 2024, G2 named us as the fastest growing company in the world. Today, we're at a $2.1bn valuation and we recently raised our Series D. This brings our total funding to over $330M from top-tier investors, including Accel, Nvidia, Kleiner Perkins, Google and top founders and operators including Stripe, Datadog, Miro, Webflow, and Facebook.

What you'll do at Synthesia:

As a Research Engineer, you will join a team of 40+ Researchers and Engineers within the R&D Department working on cutting edge challenges in the Generative AI space, with a focus on avatar-centric interactive video diffusion models. Within the team you’ll have the opportunity to work on the applied side of our research efforts and directly impact our solutions that are used worldwide by over 60,000 businesses.

This is a unique opportunity for experts in machine learning and diffusion models to shape the future of AI video agents that can think, act, and react like humans. As part of our Interactive Avatars Team, you’ll work on cutting-edge research with a clear focus on turning breakthrough ideas into real product capabilities. You’ll join a team that moves fast, iterates often, and builds models that ship and make a meaningful impact.

Example tasks and responsibilities include:

Adapt diffusion models to incorporate diverse conditioning signals (e.g., audio, motion, interaction cues).
Develop methods for streaming infinitely long video sequences at real-time rates.
Work on the perceptual layer of interactive agents, including understanding user audio and generating appropriate contextual reactions.
Improve lip-sync accuracy, motion realism, and overall visual quality in video diffusion models.
Build robust evaluation frameworks and test suites to enable continuous quality tracking.
Collaborate closely with our data team to define data needs and ensure high-quality datasets.
Stay up to date with research in world models, interactive human/agent modeling, diffusion models, and related areas.

What we're looking for:

Comfortable owning and executing on the responsibilities listed above.
Strong ML (e.g., diffusion, GANs, VAEs) and computer vision background with relevant industry experience.
Hands‑on experience with diffusion models (ideally avatar‑centric or video‑focused) and up to date with recent advances.
Proficient in PyTorch and familiar with modern ML frameworks and tooling.
Strong Python engineering skills, confident with git and version control, and a commitment to clean, maintainable research code.
Outcome‑driven, detail‑oriented, and motivated to push state‑of‑the‑art research into real product impact.
Clear communicator of hypotheses, experiments, and results.

What will make you stand out:

Experience with audio‑conditioned video diffusion models and deep knowledge of recent video DiT architectures.
Demonstrated ability to own the full model development pipeline end to end, from data preparation to model design, training, and evaluation.
A strong publication record in areas such…


Increase/decrease your Search Radius (miles)



Job Posting Language