Lead AI Engineer Job Hyderabad area,Telangana India,IT/Tech

About the Company

Lead AI Engineer (Agentic Systems & LLM Evaluation)

About the Role

We are hiring a Lead – AI Engineer to build and scale intelligent validation systems for LLM-powered products and autonomous agents. You will design AI agents that test other AI agents, implement LLM-as-a-Judge frameworks, and build automated evaluation systems for RAG pipelines, reasoning engines, and agent workflows. If you are excited about building self-evaluating AI systems and validation agents that operate at scale, this role is for you.

Responsibilities

People Leadership & Capability Development

- Lead and mentor a team of AI SDETs; build a high-performance, learning-focused engineering culture.
- Drive capability growth in automation, AI evaluation frameworks, and modern quality engineering practices.
- Conduct structured 1:1s, goal setting, and performance feedback to support career progression.
- Partner with QA/Engineering leadership on hiring, onboarding, and workforce planning.

Agile Quality & Delivery Leadership

- Embed AI quality strategy within Agile/Scrum teams; align test strategy with product risk and release readiness.
- Integrate testability, acceptance criteria, evaluation baselines, and test data planning into sprint cycles.
- Ensure Definition of Done includes measurable AI evaluation thresholds and guardrail validation.
- Remove delivery impediments and drive cross-functional collaboration across Product, Engineering, and Security.
- Maintain predictable, data-driven quality outcomes for AI-enabled features.

AI Strategy and Evaluation

- Design and implement LLM-as-a-Judge evaluation frameworks for:
- Output correctness
- Groundedness & hallucination detection
- Reasoning quality
- Task completion accuracy

Build Agentic QA systems that:

- Validate other agents’ decisions
- Test tool usage accuracy
- Simulate adversarial user behaviour
- Perform regression evaluation autonomously

Create automated validation pipelines for:

- RAG systems (retrieval scoring, faithfulness checks)
- Prompt updates
- Model upgrades
- Develop evaluation agents using various LLM and AI Tools.
- Integrate evaluation pipelines into CI/CD for continuous AI regression detection.

Define AI quality metrics such as:

- Hallucination rate
- Retrieval precision & recall
- Judge-consistency scoring

Qualifications

- Bachelor’s degree in computer science, Engineering, or equivalent experience
- 8–12+ years in Software Engineering, Machine Learning, Data Science, SDET / QA Automation, with strong exposure to AI systems, Generative AI, AI/ML.

Required Skills

- Hands-on experience with:

LLMs (GPT, Claude, Llama, Mistral)

RAG architectures

Agent frameworks (OpenAI, Claude A2A, Auto Gen, CrewAI)

- Proven experience leading Engineering teams, driving technical direction, mentoring engineers, and owning quality strategy across multiple squads.
- Strong Python expertise.
- Experience implementing LLM evaluation or LLM-as-a-Judge systems.
- Experience building scalable automation infrastructure.
- Strong Communication and interpersonal skills.
- Strong understanding of prompt engineering, hallucination risks, and model regression.

Preferred Skills

- Experience with vector databases (Pinecone, Weaviate, FAISS).
- Experience in AI observability (Lang Smith, Arize, Why Labs).
- Experience building synthetic datasets for evaluation.


Increase/decrease your Search Radius (miles)



Job Posting Language