AI Testing Engineer
Listed on 2026-01-10
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Systems Engineer
Overview
AI Testing Engineer I (Senior Staff) role at Crowe
. The position focuses on quality, reliability, safety, and compliance for enterprise AI/ML systems, leading advanced testing efforts and integrating testing into CI/CD workflows.
Everything we do is about making the future of human work more purposeful. We leverage state-of-the-art technologies, modern architecture, and industry experts to create AI-powered solutions that transform how our clients do business. The AI Transformation team will build on Crowe’s AI foundation, combining Generative AI, Machine Learning, and Software Engineering to empower Crowe clients to transform their business models through AI, irrespective of their current AI adoption stage.
AboutThe Team
- We invest in expertise. You’ll have time, space, and support to go deep in projects and develop technical and strategic mastery. You’ll work with developers, product stakeholders, and project managers as a trusted leader and domain expert.
- We believe in continuous growth and knowledge-sharing.
- We protect balance. Our distributed team culture is grounded in trust and flexibility. We offer unlimited PTO, a flexible remote work policy, and a supportive environment prioritizing sustainable, long-term performance.
The Role
The AI Testing Engineer I (Senior Staff) plays a critical role in ensuring the quality, reliability, safety, and compliance of enterprise AI and ML systems. This role leads advanced testing and validation efforts, architects automated evaluation frameworks, and assesses model behavior across functional and non-functional dimensions, including accuracy, robustness, bias, drift, and safety.
Working closely with AI engineering, data science, security, and product teams, the engineer defines testing strategies, builds evaluation datasets, and identifies risks across predictive and generative AI systems. As a senior staff-level contributor, the role establishes platform-wide testing standards, integrates AI testing into CI/CD workflows, mentors other engineers, and supports responsible AI adoption. This position advances AI validation practices and ensures dependable deployment of AI capabilities.
- Design comprehensive testing strategies for predictive models, generative AI systems, and end-to-end ML pipelines.
- Develop automated test harnesses, evaluation suites, and validation tools integrated into CI/CD.
- Analyze model outputs for correctness, safety, fairness, robustness, and stability across test scenarios.
- Build synthetic datasets, challenge sets, and adversarial test cases to uncover weaknesses.
- Evaluate LLM and generative model behavior, including hallucination rates, prompt sensitivity, and retrieval accuracy.
- Collaborate with engineering and data science teams to define evaluation criteria, KPIs, and acceptance thresholds.
- Troubleshoot complex ML system issues such as performance degradation, drift, or unexpected failures.
- Implement post-deployment monitoring to continuously validate model behavior in production.
- Document testing methodologies, findings, and recommendations to improve systems.
- Guide junior engineers and QA specialists in advanced AI testing techniques and tools.
- Ensure adherence to enterprise responsible AI, safety, security, and compliance standards.
- Identify reliability and trust risks and contribute to mitigation strategies.
- Contribute to AI platform architectural decisions to improve testability and observability.
- Research and evaluate emerging AI testing methodologies, benchmarks, and tooling ecosystems.
- 4+ years of experience in software testing, ML engineering, data science, or related roles.
- Strong proficiency in Python and automated testing frameworks.
- Deep understanding of model evaluation techniques, including precision/recall, calibration, robustness, and stability testing.
- Familiarity with LLM evaluation metrics, safety testing approaches, and structured test design.
- Ability to diagnose complex model, data, and pipeline failures.
- Strong collaboration and communication skills across technical and non-technical teams.
- Willingness to travel occasionally for cross-functional planning and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).