Machine Learning Researcher
Listed on 2026-01-12
-
IT/Tech
AI Engineer, Data Scientist
This range is provided by Recruit Seq. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base pay range$/yr - $/yr
Founding ML Research Scientist
Seattle, WA (On-Site)
Our client is a stealth-stage AI startup developing a real-time, human foundation model that brings social and emotional intelligence to voice, face, and body for next-generation interactive experiences. The team is well-funded, early-stage, and focused on building core foundational models rather than application-layer features.
This role leads the end-to-end training of large multimodal, autoregressive models that jointly reason over text, speech, facial expression, and body language in real time. You will own research, data strategy, and large-scale training pipelines to power lifelike interactive avatars that respond with nuanced expressions, gestures, and tone frame by frame.
Responsibilities- Design and train large multimodal autoregressive models across text, audio, and video (face and body) for real-time interaction.
- Develop model architectures, objectives, and optimization strategies to capture fine-grained human signals (e.g., prosody, micro-expressions, body pose dynamics).
- Build scalable training, evaluation, and deployment pipelines for low-latency inference in production environments.
- Define data collection, curation, and labeling strategies for multimodal human interaction datasets, including safety and privacy guardrails.
- Establish rigorous offline and online evaluation frameworks for social/emotional intelligence, realism, and responsiveness.
- Collaborate closely with founding researchers and leadership to translate research breakthroughs into product-ready capabilities.
- Mentor junior researchers/engineers and help set technical standards, coding practices, and research culture in a small, high-ownership team.
- 3+ years of experience training large-scale multimodal or language models, autoregressive architectures, or closely related foundation model work (industry or post-PhD).
- Strong background in deep learning for one or more of: speech, audio, computer vision (face/body), or sequence modeling.
- Hands-on experience implementing and training transformer-based or similar architectures with modern ML frameworks (e.g., PyTorch, JAX, or Tensor Flow).
- Proven track record of end-to-end model development: problem formulation, experimentation, training, evaluation, and iteration.
- Solid software engineering skills, including writing production-quality Python and working with large-scale training infrastructure on GPUs/TPUs.
- Comfort working 5 days per week onsite in the Seattle area in a fast-paced, highly collaborative environment.
- PhD in Computer Science, Electrical Engineering, Robotics, or related field with research in multimodal learning, generative models, or human–computer interaction.
- Experience building or training MLLMs, conversational agents, or avatar/embodiment systems that combine vision, audio, and language.
- Publications at top-tier ML or vision/graphics conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, SIGGRAPH).
- Prior experience in early-stage startups or small, fast-moving research teams with high ownership.
- Familiarity with real-time inference optimization (quantization, distillation, on-device deployment) and streaming architectures.
Mid-Senior level
Employment typeFull-time
Job functionResearch and Engineering
Industries- Staffing and Recruiting
- Software Development
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).