Research Engineer
Listed on 2026-03-08
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer -
Engineering
AI Engineer
We’re looking for a research engineer to work on the core multimodal retrieval and video reasoning systems that power the company. This is a 50/50 research and engineering role - you’ll design novel approaches to hard retrieval and understanding problems, and you’ll ship them into production where real customers depend on them.
You’ll work across:
- Multimodal retrieval - finding relevant moments across visual, audio, and text signals in large video collections
- Structured extraction - pulling entities, facts, and relationships from video content
- Video reasoning - understanding temporal, causal, and semantic relationships across long-form content
- Evaluation and benchmarking - designing metrics and datasets to measure real-world system quality
This is not a pure research role
. You’ll be expected to take ideas from paper to prototype to production. But it’s also not a pure engineering role - we need someone with genuine research depth who can identify the right problems to work on and design novel solutions.
- Multimodal retrieval: Design and improve retrieval systems that search across video, audio, and text - including embedding models, re-ranking, and hierarchical search strategies.
- Video understanding: Build systems that extract structured information from video - temporal segmentation, entity extraction, scene understanding, and content summarization.
- Model fine-tuning & integration: Fine-tune and adapt vision and language models (LoRA/PEFT, full fine-tuning) for production use cases. Evaluate open-source and proprietary models and orchestrate them in serving pipelines.
- Experiment and ship: Run experiments, analyze results rigorously, and turn successful research into production systems that handle real‑world video at scale.
- Collaborate: Work directly with founders and infrastructure engineers. Short feedback loops, no layers of process.
- MS or PhD in computer science, machine learning, or a related field
- Research experience in one or more of: multimodal learning, information retrieval, computer vision, NLP, or video understanding
- Strong implementation skills in Python and PyTorch (or equivalent)
- Ability to independently drive research from idea to experiment to working system
- First-author publication at a top venue (NeurIPS, CVPR, ICCV, ECCV, ACL, EMNLP, SIGIR, ISMIR, ICASSP, or similar)
- Experience with video or multimodal foundation models (CLIP, LLaVA, Qwen3‑VL, etc.)
- Experience with retrieval systems, embedding models, or ranking/re-ranking pipelines
- Experience deploying ML systems in production
- Familiarity with vector databases (Milvus, Weaviate) or search infrastructure
- Experience with model fine-tuning techniques (LoRA, PEFT, QLoRA) and training infrastructure (Ray, Kubeflow, or similar)
- Experience with ML inference serving (vLLM, Tensor
RT, Triton, or similar)
Video is the largest and most underutilized data source on the internet. Most software still can’t meaningfully search or reason over it. The research problems here – multimodal retrieval, temporal reasoning, structured extraction from noisy real‑world content – are genuinely unsolved and directly tied to the product.
If you want to work on:
- Research problems with immediate, measurable product impact
- A domain where the state of the art is still being defined
- A small team where your research directly shapes the product
- Multimodal systems at real scale, not toy benchmarks
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).