Member of Technical Staff - Platform
Listed on 2026-01-17
-
Software Development
Software Engineer, Cloud Engineer - Software
About the Role
We are looking for an exceptional mid-level to senior engineer to join our team.
You, alongside the team, will own the platform that runs our benchmarks. This spans everything needed to evaluate LLMs at scale:
Python libraries, a web platform, distributed systems, cloud infrastructure, and tooling. You'll work across the stack—whatever needs to be built to run benchmarks reliably and efficiently.
At Vals, we believe in autonomy. You will be given a high degree of independence to make decisions on tech stacks, system architecture, and code structure. You will also provide guidance to others on the team, both through informal feedback and formal processes like architecture reviews and code reviews.
Our platform serves startups, enterprises, and research labs measuring model performance. We work with all the major foundation model labs, some of the largest financial institutions, and hospital systems in the world. Our work has been featured by the Wall Street Journal, Washington Post, and Bloomberg.
We are building the standard for evaluating the ability of LLMs to perform real-world tasks. You will contribute directly to the infrastructure that makes this possible.
What You'll DoBuild distributed systems to run evaluations across multiple models, benchmarks, and machines at scale
Deploy cloud infrastructure using IAC, including deployment pipelines, servers, logging, monitoring, etc.
Contribute to the internal and external libraries we maintain, including our public model library
Develop full-stack features for our platform using React/Type Script on the frontend and Python/Django on the backend
Perform code and architecture reviews for other members of the team
Help establish engineering best practices across the organization
Collaborate closely with the research team to ensure our infrastructure meets their needs
Technical
2+ YOE: 2+ years of full-time experience in software engineering. If you are a new grad, we encourage you to apply to our MTS - Infra role.
Strong engineering fundamentals
:
You can build and ship quickly with high quality. You should have a track record of building things of significant scope (at jobs, side projects, open source, etc.)Python expertise
:
Significant experience in Python, especially in a professional setting.System Design
Experience:
You should be familiar with common concepts like VMs, containerization, load balancers, databases, etc. and when to use them appropriately.Familiarity with LLMs: You should have previously worked with LLM APIs, and understand concepts like temperature, tokenization, reasoning, etc.
Non-technical:
Team collaboration
:
Experience working in development sprints, Git workflows, and pull request reviewsCommunication
:
Strong ability to provide and receive feedback effectively. This includes both spoken and written communication (e.g. design docs).Comfort with Ambiguity: You will often be the one taking a fuzzy problem and breaking it down into clear and actionable steps.
Iteration speed
: A tenacity to develop and iterate quickly. If you are coming from a large organization, the speed at which we ship will likely be uncomfortable initially.Location
:
We are an in‑person team based in San Francisco. We will support your relocation or transportation as needed.
Experience with frontend development, ideally React.
Experience with Django, FastAPI, or other Python-based HTTP servers
Experience working with AWS infrastructure, including IaC
Experience at early‑stage startups or your own company
Interest in AI/ML systems and evaluation
Highly competitive salary and meaningful ownership. Excellence is well rewarded.
Relocation and transportation support
Health/dental insurance coverage
Lunch and dinner provided, free snacks/coffee/drinks
401K plan
Unlimited PTO
Founding team
:
The core methodology behind this platform comes from NLP evaluation research we had done raised a $5M seed from some of the top institutional and angel investors in the valley. Our team has prior work experience at NVIDIA, Meta, Microsoft, Palantir and HRT. Collectively, we have over 300 citations in our published work. Our early team include…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).