Job Description & How to Apply Below
Lead AI Engineer (Agentic Systems & LLM Evaluation)
About the Role
We are hiring a Lead – AI Engineer to build and scale intelligent validation systems for LLM-powered products and autonomous agents. You will design AI agents that test other AI agents, implement LLM-as-a-Judge frameworks, and build automated evaluation systems for RAG pipelines, reasoning engines, and agent workflows. If you are excited about building self-evaluating AI systems and validation agents that operate at scale, this role is for you.
Responsibilities
People Leadership & Capability Development
- Lead and mentor a team of AI SDETs; build a high-performance, learning-focused engineering culture.
- Drive capability growth in automation, AI evaluation frameworks, and modern quality engineering practices.
- Conduct structured 1:1s, goal setting, and performance feedback to support career progression.
- Partner with QA/Engineering leadership on hiring, onboarding, and workforce planning.
Agile Quality & Delivery Leadership
- Embed AI quality strategy within Agile/Scrum teams; align test strategy with product risk and release readiness.
- Integrate testability, acceptance criteria, evaluation baselines, and test data planning into sprint cycles.
- Ensure Definition of Done includes measurable AI evaluation thresholds and guardrail validation.
- Remove delivery impediments and drive cross-functional collaboration across Product, Engineering, and Security.
- Maintain predictable, data-driven quality outcomes for AI-enabled features.
AI Strategy and Evaluation
- Design and implement LLM-as-a-Judge evaluation frameworks for:
- Output correctness
- Groundedness & hallucination detection
- Reasoning quality
- Task completion accuracy
Build Agentic QA systems that:
- Validate other agents’ decisions
- Test tool usage accuracy
- Simulate adversarial user behaviour
- Perform regression evaluation autonomously
Create automated validation pipelines for:
- RAG systems (retrieval scoring, faithfulness checks)
- Prompt updates
- Model upgrades
- Develop evaluation agents using various LLM and AI Tools.
- Integrate evaluation pipelines into CI/CD for continuous AI regression detection.
Define AI quality metrics such as:
- Hallucination rate
- Retrieval precision & recall
- Judge-consistency scoring
Qualifications
- Bachelor’s degree in computer science, Engineering, or equivalent experience
- 8–12+ years in Software Engineering, Machine Learning, Data Science, SDET / QA Automation, with strong exposure to AI systems, Generative AI, AI/ML.
Required Skills
- Hands-on experience with:
LLMs (GPT, Claude, Llama, Mistral)
RAG architectures
Agent frameworks (OpenAI, Claude A2A, Auto Gen, CrewAI)
- Proven experience leading Engineering teams, driving technical direction, mentoring engineers, and owning quality strategy across multiple squads.
- Strong Python expertise.
- Experience implementing LLM evaluation or LLM-as-a-Judge systems.
- Experience building scalable automation infrastructure.
- Strong Communication and interpersonal skills.
- Strong understanding of prompt engineering, hallucination risks, and model regression.
Preferred Skills
- Experience with vector databases (Pinecone, Weaviate, FAISS).
- Experience in AI observability (Lang Smith, Arize, Why Labs).
- Experience building synthetic datasets for evaluation.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×