ML Research Engineer Job Seattle area,Washington USA,Software Development

Overview

Nuance Labs is building the next generation of emotionally expressive, real-time video AI.

This is a critical role to build and shape the machine learning foundations of our company. You will work at the intersection of research and production — translating experimental breakthroughs into optimized, scalable models that power our real-time video AI platform.

Key Facts

$10M seed round backed by Accel, South Park Commons, Lightspeed, and top angels including Synthesia’s former CPO.
A world-class team of PhDs from MIT, UW, and Oxford with decades of industry experience at Apple and Meta, advancing real-time avatars from cutting-edge research to products used by millions.
In-person collaboration, 5 days a week at Seattle HQ

Responsibilities

Operationalize Research: Collaborate with researchers to move models from experimental checkpoints to production-ready systems. Establish patterns for large-scale training, rapid experimentation, and deployment of new architectures.
Optimize Model Performance: Profile and improve model inference for latency and throughput using quantization, pruning, distillation, and architectural refinements to ensure viable unit economics
Model Acceleration: Apply optimization techniques (Tensor

RT, ONNX, vLLM, Triton) to accelerate multimodal models including video diffusion, LLMs, and speech models
Design Data Pipelines: Design and implement efficient pipelines for video data ingestion, preprocessing, and training at petabyte scale using tools like Dagster or Ray.
Evaluate and Iterate: Build evaluation frameworks to measure model quality, establish benchmarks, and guide continuous improvement of model capabilities.

Requirements

Deep Learning

Experience:

Strong knowledge of PyTorch and modern ML architectures. Experience training and optimizing large models (transformers, diffusion models, or similar).
Production ML: Experience deploying ML models to production. You understand common failure modes and how to address them (resource contention, OOMs, batch optimization)
Systems Proficiency: Comfortable working with GPUs, debugging CUDA issues, and profiling model workloads to identify compute or memory bottlenecks.
Data Engineering: Experience building scalable data pipelines for high-bandwidth media processing and training workflows.

Preferred Experience

Experience with video or audio models in research or production settings
Familiarity with low-level optimization (CUDA kernels, Triton, custom operators)
Knowledge of real-time ML systems and latency-critical inference
Prior work with model compression techniques (quantization, distillation, pruning)

Application

To apply, email careers with your CV and a short note on why your background is a great fit for this role.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language