Research Engineer Job San Francisco area,California USA,IT/Tech

Position: Research Engineer [32918]

We’re looking for a research engineer to work on the core multimodal retrieval and video reasoning systems that power the company. This is a 50/50 research and engineering role - you’ll design novel approaches to hard retrieval and understanding problems, and you’ll ship them into production where real customers depend on them.

You’ll work across:

Multimodal retrieval - finding relevant moments across visual, audio, and text signals in large video collections
Structured extraction - pulling entities, facts, and relationships from video content
Video reasoning - understanding temporal, causal, and semantic relationships across long-form content
Evaluation and benchmarking - designing metrics and datasets to measure real-world system quality

This is not a pure research role
. You’ll be expected to take ideas from paper to prototype to production. But it’s also not a pure engineering role - we need someone with genuine research depth who can identify the right problems to work on and design novel solutions.

What You’ll Do

Multimodal retrieval: Design and improve retrieval systems that search across video, audio, and text - including embedding models, re-ranking, and hierarchical search strategies.
Video understanding: Build systems that extract structured information from video - temporal segmentation, entity extraction, scene understanding, and content summarization.
Model fine-tuning & integration: Fine-tune and adapt vision and language models (LoRA/PEFT, full fine-tuning) for production use cases. Evaluate open-source and proprietary models and orchestrate them in serving pipelines.
Experiment and ship: Run experiments, analyze results rigorously, and turn successful research into production systems that handle real‑world video at scale.
Collaborate: Work directly with founders and infrastructure engineers. Short feedback loops, no layers of process.

What We’re Looking For Required

MS or PhD in computer science, machine learning, or a related field
Research experience in one or more of: multimodal learning, information retrieval, computer vision, NLP, or video understanding
Strong implementation skills in Python and PyTorch (or equivalent)
Ability to independently drive research from idea to experiment to working system

Nice to Have

First-author publication at a top venue (NeurIPS, CVPR, ICCV, ECCV, ACL, EMNLP, SIGIR, ISMIR, ICASSP, or similar)
Experience with video or multimodal foundation models (CLIP, LLaVA, Qwen3‑VL, etc.)
Experience with retrieval systems, embedding models, or ranking/re-ranking pipelines
Experience deploying ML systems in production
Familiarity with vector databases (Milvus, Weaviate) or search infrastructure
Experience with model fine-tuning techniques (LoRA, PEFT, QLoRA) and training infrastructure (Ray, Kubeflow, or similar)
Experience with ML inference serving (vLLM, Tensor

RT, Triton, or similar)

Why us?

Video is the largest and most underutilized data source on the internet. Most software still can’t meaningfully search or reason over it. The research problems here – multimodal retrieval, temporal reasoning, structured extraction from noisy real‑world content – are genuinely unsolved and directly tied to the product.

If you want to work on:

Research problems with immediate, measurable product impact
A domain where the state of the art is still being defined
A small team where your research directly shapes the product
Multimodal systems at real scale, not toy benchmarks

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language