×
Register Here to Apply for Jobs or Post Jobs. X

Research Scientist

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Monarch Recruiters
Full Time position
Listed on 2026-01-27
Job specializations:
  • Research/Development
    Data Scientist
  • IT/Tech
    Data Scientist, AI Engineer, Data Science Manager
Job Description & How to Apply Below
Get AI-powered advice on this job and more exclusive features.

We are looking for exceptional researchers and research engineers to design and build the next generation of AI benchmarks. You will create high-impact, challenging evaluations that push the boundaries of what we can measure in foundation models. This role is perfect for someone with deep research expertise who wants to see their work directly influence how the world evaluates AI systems.

You will lead the design and development of novel benchmarks that assess real-world capabilities of LLMs. Our benchmark shapes how foundation models are developed and generative AI applications are built. We work with all the major foundation model labs, some of the largest financial institutions, and hospital systems in the world. Our work has been featured by the Wall Street Journal, Washington Post, and Bloomberg.

We are building the standard for evaluating the ability of LLMs to perform real-world tasks. You will be at the forefront of defining what that standard looks like.

What You'll Do

• Design and develop novel, high-impact benchmarks that assess challenging real-world capabilities

• Conduct research to ensure our benchmarks are valid, reliable, and meaningful

• Collaborate with foundation model labs and enterprises to understand evaluation needs

• Analyze model performance across benchmarks and communicate findings

• Publish research findings and contribute to the broader evaluation research community

• Work closely with the infrastructure team to implement your benchmark designs at scale

• Stay current with the latest developments in LLM capabilities and evaluation methodologies

Requirements

• Advanced research experience:
Master's degree or PhD in Computer Science, NLP, Machine Learning, or related field. Undergrads with very strong research backgrounds may also be considered.

• Publication track record:
Published papers in reputable venues (NeurIPS, ICML, ACL, EMNLP, etc.) with focus on NLP, ML evaluation, or benchmarking

• Research methodology:
Strong understanding of experimental design, statistical analysis, and evaluation frameworks

• Technical skills:
Proficiency in Python for research and experimentation

• Communication:
Ability to clearly communicate complex research ideas to both technical and non-technical audiences



Collaboration:

Experience working in research teams and integrating feedback

• Portfolio:
Demonstrated track record of impactful research work



Location:

We are an in‑person team based in San Francisco. We will support your relocation or transportation as needed.

Nice to Haves

>Experience specifically in LLM evaluation or benchmarking research

• Familiarity with foundation model architectures and capabilities

• Experience working with industry partners or in applied research settings

• Background in areas like human-computer interaction, psychology, or domain-specific evaluation

• Experience at early-stage startups or research labs

• Contributions to open-source evaluation tools or datasets

#JLjbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary