More jobs:
AI QA Engineer Agentic Generative at Dallas TX
Job in
Dallas, Dallas County, Texas, 75215, USA
Listed on 2026-03-01
Listing for:
NATIONMIND LLC
Full Time
position Listed on 2026-03-01
Job specializations:
-
IT/Tech
AI Engineer
Job Description & How to Apply Below
About Nation Mind LLC
Nation Mind LLC is a technology consulting firm focused on software development and QA testing services. We help clients build reliable, scalable applications with a strong emphasis on automation, performance, and quality. Our team works across industries, delivering solutions that drive innovation and operational efficiency. We are currently hiring skilled professionals for AI QA ENGINEER (AGENTIC & GENERATIVE) to join our growing team.
Title:
AI QA ENGINEER (AGENTIC & GENERATIVE)
Location:
Dallas, TX – Hybrid Couple of days a month
Interview mode:
Face to Face
Agentic QA Engineer – Generative AI & Agentic Systems (Agent, Multi‑Agent Testing)
Key Responsibilities- Quality Strategy & Leadership
- Agentic & Multi‑Agent Testing
- Reliability, Resiliency, and Latency
- Accuracy & Macro-Level Validations
- Scale & Orchestration
- Dev → Prod Readiness
- Define and own the QA strategy for agentic/multi-agent AI systems across dev, staging, and prod.
- Mentor a team of QA engineers; establish testing standards, coding guidelines for test harnesses, and review practices.
- Partner with Agentic Operations, Data Science, MLOps, and Platform teams to embed QA in the SDLC and incident response.
- Design tests for agent orchestration, tool calling, planner‑executor loops, and inter‑agent coordination (e.g., task decomposition, handoff integrity, and convergence to goals).
- Validate state management, context windows, memory/knowledge stores, and prompt/graph correctness under varying conditions.
- Implement scenario fuzzing (e.g., adversarial inputs, prompt perturbations, tool latency spikes, degraded APIs).
- Create resilience testing suites: chaos experiments, failover, retries/backoff, circuit‑breaking, and degraded mode behavior.
- Establish latency SLOs and measure end‑to‑end response times across orchestration layers (LLM calls, tool invocations, queues).
- Ensure reliability through soak tests, canary verifications, and automated rollbacks.
- Define ground‑truth and reference pipelines for task accuracy (exact match, semantic similarity, factuality checks).
- Build macro validation frameworks that validate task outcomes across multi‑step agent workflows (e.g., complex data pipelines, content generation + verification agent loops).
- Instrument guardrail validations (toxicity, PII, hallucination, policy compliance).
- Design load/stress tests for multi‑agent graphs under scale (concurrency, throughput, queue depth, back pressure).
- Validate orchestrator correctness (DAG execution, retries, branching, timeouts, compensation paths).
- Engineer reusable test artifacts (scenario configs, synthetic datasets, prompt libraries, agent graph fixtures, simulators).
- Integrate tests into CI/CD (pre‑merge gates, nightly, canary) and production monitoring with alerting tied to KPIs.
- Define release criteria and run operational readiness (performance, security, compliance, cost/latency budgets).
- Build post‑deployment validation playbooks and incident triage runbooks.
- 7+ years in Software QA/Testing, with 2+ years in AI/ML or LLM‑based systems; hands‑on experience testing agentic/multi‑agent architectures.
- Strong programming skills in Python or Type Script/JavaScript; experience building test harnesses, simulators, and fixtures.
- Experience with LLM evaluation (exact/soft match, BLEU/ROUGE, BERTScore, semantic similarity via embeddings), guardrails, and prompt testing.
- Expertise in distributed systems testing latency profiling, resiliency patterns (circuit breakers, retries), chaos engineering, and message queues.
- Familiarity with orchestration frameworks (Lang Chain, Lang Graph, Llama Index, DSPy, OpenAI Assistants/Actions, Azure OpenAI orchestration, or similar).
- Proficiency with CI/CD (Git Hub Actions/Azure Dev Ops), observability (Open Telemetry, Prometheus/Grafana, Datadog), and feature flags/canaries.
- Solid understanding of privacy/security/compliance in AI systems (PII handling, content policies, model safety).
- Excellent communication and leadership skills; proven ability to work cross‑functionally with Ops, Data, and Engineering.
- Experience with multi‑agent simulators, agent graph testing, and tooling latency emulation.
- Knowledge of MLOps (model versioning, datasets, evaluation pipelines) and A/B experimentation for LLMs.
- Background in cloud (AWS), serverless, containerization, and event‑driven architectures.
- Prior ownership of cost/latency/SLAs for AI workloads in production.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×