×
Register Here to Apply for Jobs or Post Jobs. X

Research Engineer

Job in New York City, Richmond County, New York, 10261, USA
Listing for: Datalab
Full Time position
Listed on 2025-12-01
Job specializations:
  • IT/Tech
    Data Scientist, AI Engineer, Data Engineer, Systems Engineer
Job Description & How to Apply Below

At Datalab, we train state-of-the-art language models that read documents with human-level accuracy and power the next generation of AI products, workflows, and research.

Our models - Chandra, Surya, and Marker - have become the backbone of document intelligence, with more than 50,000 Git Hub stars and adoption across top tier 1 AI research labs, Fortune 500 enterprises, and government agencies.

We've grown to 7-figure ARR with ~7x growth in 2025, driven by a lean, senior team that operates with high autonomy and deep technical ownership.

Backed by founding members of OpenAI, FAIR, and Hugging Face. We move fast, ship often, and we're hiring builders who do the same.

Role Description

We’re looking for a Research Engineer to work across our open-source repos, inference API, and model training stack. You’ll operate at the intersection of applied research and engineering — shaping the models that power real-world document intelligence systems used by enterprises and developers globally.

You’ll be training and evaluating new model architectures, integrating them into production, and shipping updates across our open-source ecosystem. You’ll also help close the loop with users — investigating issues, improving benchmarks, and turning real feedback into better model performance.

Our team focuses on training small, efficient models that outperform much larger LLMs on domain-specific tasks (like OCR, structured extraction, and math recognition). We move fast, prioritize practical results, and build tools that are open, reproducible, and built to last.

Day to day, you will:
  • Train and evaluate models: Train task-specific models (OCR, layout, text recognition, extraction). Explore architectures and training strategies to optimize task performance.
  • Optimize inference: Profile and accelerate model inference across different hardware setups (H100s, L40s, CPUs).
  • Contribute to open source: Ship features and improvements to our core open-source repos, including model APIs, data loaders, evaluation scripts, and benchmark tooling.
  • Build and maintain datasets: Source, design, and clean datasets for supervised and synthetic training; create reproducible pipelines for data versioning and evaluation.
  • Experiment and benchmark: Run ablations, track metrics, and publish findings that inform model design and internal research direction.
  • Engage with users and partners: Occasionally join calls or Slack threads to help customers evaluate, deploy, and extend models.
Ideal Candidate

You’ve shipped models that made it into production. You understand how to balance exploration with delivery, and how to turn research insights into products people actually use. You work autonomously and thrive in unstructured environments, but you’re also a strong collaborator — you communicate clearly, document your work, and elevate the people around you.

  • 3+ years experience training, fine-tuning, and evaluating LLMs
  • Trained at least one production‑grade model or system used in real‑world applications
  • Deep expertise in PyTorch and Python, with strong fundamentals in deep learning (optimization, evaluation, architecture design)
  • Comfortable with data engineering, benchmarking, and performance profiling across hardware setups
  • Have experience with OCR, document AI, or structured extraction
  • Have published work — whether that’s a paper, a benchmark report, or a deep technical blog post
  • Have been a major contributor to open‑source projects, especially in ML, vision, or NLP
  • Enjoy writing about your work and sharing learnings with the community
Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

Engineering and Information Technology

Referrals increase your chances of interviewing at Datalab by 2x.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary