×
Register Here to Apply for Jobs or Post Jobs. X
More jobs:

Senior Engineer, AI Evaluation & Reliability; Agentic AI

Job in Redwood City, San Mateo County, California, 94061, USA
Listing for: Anomali
Full Time position
Listed on 2026-01-23
Job specializations:
  • IT/Tech
    AI Engineer
Salary/Wage Range or Industry Benchmark: 150000 - 200000 USD Yearly USD 150000.00 200000.00 YEAR
Job Description & How to Apply Below
Position: Senior Engineer, AI Evaluation & Reliability (Agentic AI)

Senior Engineer, AI Evaluation & Reliability (Agentic AI)

Join to apply for the Senior Engineer, AI Evaluation & Reliability (Agentic AI) role at Anomali
.

Anomali is headquartered in Silicon Valley and is the leading AI‑powered security operations platform that modernizes security operations. At the center of it is an omnipresent, intelligent, and multilingual Anomali Copilot that automates important tasks and empowers teams to deliver risk insights to management and the board in seconds. The Copilot navigates a proprietary cloud‑native security data lake that consolidates legacy attempts at visibility and provides first‑in‑market speed, scale, and performance while reducing the cost of security analytics.

Anomali combines ETL, SIEM, XDR, SOAR, and the largest repository of global intelligence into one efficient platform.

Job Description

We're looking for a Senior Engineer, AI Evaluation & Reliability to lead the design and execution of evaluation, quality assurance, and release gating for our agentic AI features. You'll develop the pipelines, datasets, and dashboards that measure and improve agent performance across real‑world SOC workflows—ensuring every release is safe, reliable, efficient, and production‑ready. You will guarantee that our agentic AI features operate at full production scale, ingesting and active on millions of SOC alerts per day, with measurable impact on analyst productivity and risk mitigation.

This role partners closely with the Product team to deliver operational excellence and trust in every AI‑driven capability.

Key Responsibilities
  • Define quality metrics: translate SOC use cases into measurable KPI’s (e.g., precision/recall, MTTR, false‑positive rate, step success, latency/cost budgets)
  • Build continuous evaluations: develop off‑line/online evaluation pipelines, regression suites, and A/B or canary tests; integrate them into CI/CD for release gating
  • Curate and manage datasets: maintain gold‑standard datasets and red‑team scenarios; establish data governance and drift monitoring practices
  • Ensure safety, reliability, and explainability: partner with Platform and Security Research to encode guardrails, policy enforcement, and runtime safety checks
  • Expand adversarial test coverage (prompt injection, data exfiltration, abuse scenarios)
  • Ensure explainability and auditability of agent decisions, maintaining traceability and compliance of AI‑driven workflows
  • Production reliability & observability: monitor and maintain reliability of agentic AI features post‑release—define and uphold SLIs/SLOs, establish alerting and rollback strategies, and conduct incident post‑mortems
  • Design and implement infrastructure to scale evaluation and production pipelines for real‑time SOC workflows across cloud environments
  • Drive agentic system engineering: experiment with multi‑agent systems, tool‑using language models, retrieval‑augmented workflows, and prompt orchestration
  • Manage model and prompt lifecycle—track version, rollout strategies, and fallbacks; measure impact through statistically sound experiments
  • Collaborate cross‑functionally: work with Product, UX and Engineering to prioritize high‑leverage improvements, resolve regressions quickly, and advance overall system reliability
Qualifications Required Skills and Experience
  • 5+ years building evaluation or testing infrastructure for ML/LLM systems or large‑scale distributed systems
  • Proven ability to translate product requirements into measurable metrics and test plans
  • Strong Python skills (or similar language) and experience with modern data tooling
  • Hands‑on experience running A/B tests, canaries, or experiment frameworks
  • Experience defining and maintaining operational reliability metrics (SLIs/SLOs) for AI‑driven systems
  • Familiarity with large‑scale distributed or streaming systems serving AI/agent workflows (millions of events or alerts/day)
  • Excellent communication skills—able to clearly convey technical results and trade‑offs to engineers, PMs, and analysts
  • This position is not eligible for employment visa sponsorship. The successful candidate must not now or in the future require visa sponsorship to work in the US.
Preferred Qualifications
  • Experience…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary