Senior Research Engineer - Audio Post-Training
Listed on 2026-01-14
-
Software Development
AI Engineer, Machine Learning/ ML Engineer
Welcome to the video first world
From your everyday PowerPoint presentations to Hollywood movies, AI will transform the way we create and consume content.
Today, people want to watch and listen, not read — both at home and you’re reading this and nodding, check out our brand video.
Despite the clear preference for video, communication and knowledge sharing in the business environment are still dominated by text, largely because high‑quality video production remains complex and challenging to scale—until now….
Meet SynthesiaWe're on a mission to make video easy for everyone. Born in an AI lab, our AI video communications platform simplifies the entire video production process, making it easy for everyone, regardless of skill level, to create, collaborate, and share high‑quality videos. Whether it's for delivering essential training to employees and customers or marketing products and services, Synthesia enables large organizations to communicate and share knowledge through video quickly and efficiently.
We’re trusted by leading brands such as Heineken, Zoom, Xerox, McDonald’s and more. Read stories from happy customers and what 1,200+ people say on G2.
In February 2024, G2 named us as the fastest growing company in the world. Today, we're at a $2.1bn valuation and we recently raised our Series D. This brings our total funding to over $330M from top‑tier investors, including Accel, Nvidia, Kleiner Perkins, Google and top founders and operators including Stripe, Datadog, Miro, Webflow, and Facebook.
What you'll do at Synthesia:As a Research Engineer you will join a team of 40+ Researchers and Engineers within the R&D Department working on cutting‑edge challenges in the Generative AI space, with a focus on creating high‑quality, expressive and real‑time synthetic voices. Within the team you’ll have the opportunity to work on the applied side of our research efforts and directly impact our solutions that are used worldwide by over 60,000 businesses.
If you are an expert in ML LLMs speech generation conversational models, this is your chance to make a global impact. You will join our Audio Post‑Training Team, which works on generative speech and voice synthesis, ensuring our in‑house voice models reach production‑level quality, speed, and robustness. Typical projects include:
- Adapt models for new conditioning inputs (emotion, speed, prosody, speaker control, etc.).
- Fine‑tune and optimize speech models using advanced techniques such as DPO (Direct Preference Optimization), LoRA, and other parameter‑efficient methods to improve voice quality and expressiveness.
- Implement post‑training optimization techniques (quantization, pruning, distillation) to improve efficiency and latency in real‑time speech generation.
- Integrate and test novel architectures, such as neural codecs, diffusion, or flow‑matching models, to enhance realism and responsiveness.
- Design and implement new evaluation metrics for TTS systems, including automated Mean Opinion Score (MOS) prediction models for continuous quality assessment.
- Stay updated with the latest research in audio diffusion, autoregressive models, neural codecs, and multimodal LLMs.
- Strong understanding of generative modelling, ideally applied to sequential or multimodal data.
- Hands‑on experience with large language models (LLMs) or similar transformer‑based architectures.
- High proficiency in PyTorch, including experience with distributed training and model optimization.
- Solid grasp of time‑series modelling and tokenization, preferably in the context of audio or speech.
- Demonstrated ability to prototype quickly, test hypotheses, and iterate efficiently.
- Proven experience in training deep learning models end‑to‑end, from data preparation to evaluation.
- Strong general software engineering skills, enabling contributions to a large, shared research infrastructure.
Nice‑to have experience
- Familiarity with state‑of‑the‑art architectures in audio and speech generation (e.g., diffusion models, neural codecs, flow‑matching models, autoregressive decoders).
- Experience with speech‑to‑speech or text‑to‑speech (TTS) systems.
- Evidence of original research contributions, such…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: