×
Register Here to Apply for Jobs or Post Jobs. X

AI Engineer, Quality

Remote / Online - Candidates ideally in
San Francisco, San Francisco County, California, 94199, USA
Listing for: Fieldguide
Remote/Work from Home position
Listed on 2026-03-01
Job specializations:
  • Software Development
    Software Engineer, AI Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

About Us

Fieldguide is establishing a new state of trust for global commerce and capital markets through automating and streamlining the work of assurance and audit practitioners specifically within cybersecurity, privacy, and financial audit. Put simply, we build software for the people who enable trust between businesses.

We’re based in San Francisco, CA, but built as a remote-first company that enables you to do your best work from anywhere. We're backed by top investors including Growth Equity at Goldman Sachs Alternatives, Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, DNX Ventures, Global Founders Capital, Justin Kan, Elad Gil, and more.

We value diversity — in backgrounds and in experiences. We need people from all backgrounds and walks of life to help build the future of audit and advisory. Fieldguide’s team is inclusive, driven, humble and supportive. We are deliberate and self‑reflective about the kind of team and culture that we are building, seeking teammates that are not only strong in their own aptitudes but care deeply about supporting each other's growth.

As an early stage start‑up employee, you’ll have the opportunity to build out the future of business trust. We make audit practitioners’ lives easier by bringing together up to 50% of their work and giving them better work‑life balance. If you share our values and enthusiasm for building a great culture and product, you will find a home at Fieldguide.

About the Role

Fieldguide is building AI agents for the most complex audit and advisory workflows. We're a San Francisco-based Vertical AI company building in a $100B+ market undergoing rapid transformation. Over 50 of the top 100 accounting and consulting firms trust us to power their most mission‑critical work. We're backed by Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, Elad Gil, and other top‑tier investors.

As an AI Engineer, Quality
, you will own the evaluation infrastructure that ensures our AI agents perform reliably at enterprise scale. This role is 100% focused on creating evaluations as first‑class engineering capability: building the unified platform, automated pipelines, and production feedback loops that let us evaluate any new model against all critical workflows within hours. You'll work at the intersection of ML engineering, observability, and quality assurance to ensure our agents meet the rigorous standards our customers demand.

We’re hiring across all levels. We'll calibrate seniority during interviews based on your background and what you're looking to own. This role is for engineers who value in‑person collaboration at our San Francisco, CA office.

What You'll Own Measurable AI Agents
  • Design and build a unified evaluation platform that serves as the single source of truth for all of our agentic systems and audit workflows

  • Build observability systems that surface agent behavior, trace execution, and failure modes in production, and feedback loops that turn production failures into first‑class evaluation cases

  • Own the evaluation infrastructure stack including integration with Lang Smith and Lang Graph

  • Translate customer problems into concrete agent behaviors and workflows

  • Integrate and orchestrate LLMs, tools, retrieval systems, and logic into cohesive, reliable agent experiences

Rapid Model Evaluation
  • Build automated pipelines that evaluate new models against all critical workflows within hours of release

  • Design evaluation harnesses for our most complex Agentic systems and workflows

  • Implement comparison frameworks that measure effectiveness, consistency, latency, and cost across model versions

  • Design guardrails and monitoring systems that catch quality regressions before they reach customers

AI-native engineering execution
  • Use AI as core leverage in how you design, build, test, and iterate

  • Prototype quickly to resolve uncertainty, then harden systems for enterprise‑grade reliability

  • Build evaluations, feedback mechanisms, and guardrails so agents improve over time

  • Work with SMEs and ML Engineers to create evaluation datasets by curating production traces

  • Design prompts, retrieval pipelines, and agent orchestration systems that perform reliably at scale

Owners…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary