Skip to main content

Testing the Untestable: Building a strategy for testing AI

25-minute Talk

AI Agents output change every time. "Expected Results" do not exist. Traditional automation is brittle. Manual testing is too slow. Results are subjective! How on earth do you move quickly?

Virtual Pass session

Timetable

10:45 a.m. – 11:30 a.m. Tuesday 17th

Room

Room F1 - Track 1: Talks

Artificial Intelligence (AI)

Audience

Testers, Managers

Key-Learnings

  • The Input Shift: Moving from static test cases to Automated Persona-Driven Testing, using AI to simulate thousands of diverse user interactions.
  • The Verification Shift: Replacing binary assertions with Semantic Evaluation.
  • The Safety Shift: Why automation isn't enough. Discuss the role of Human-in-the-Loop review and the unique value it adds to high-risk scenarios.

For decades, testing has relied on a simple truth: If I provide Input X, I should get Output Y. Generative AI broke that truth.

For decades, testing has relied on a simple truth: If I provide Input X, I should get Output Y.

Generative AI broke that truth.

When you build AI Agents, the output changes every time. "Expected Results" do not exist in the same way. Traditional automation is brittle. Manual testing is too slow. And to make matters worse, results are subjective!


So, how do you assure quality in a system that, by its very nature, is unpredictable?


This talk outlines the strategic concepts required to tame the chaos of testing AI by exploring the fundamental shifts in the Quality Lifecycle:

  • The Input Shift: Moving from static test cases to Automated Persona-Driven Testing, using AI to simulate thousands of diverse user interactions 
  • The Verification Shift: Replacing binary assertions with Semantic Evaluation
  • The Baseline Shift: How to establish a "Quality Baseline" for your product. 
  • The Safety Shift: Why automation isn't enough. We will discuss the role of Human-in-the-Loop (HITL) review and the unique value it adds to high-risk scenarios.
  • The Observability Shift: We discuss the importance of detailed tracing to understand not just what the model said, but why it decided to say it.
     

 

Related Sessions

Virtual Pass session
Thu, Nov 19 • 9:15 a.m. – 10:15 a.m.
Room F1+F2+F3 - Plenary

45-minute Keynote

Artificial Intelligence (AI) Other

Thu, Nov 19 • 1:30 p.m. – 4:30 p.m.
Room D5+D6 - Track 8: Workshops

180-minute Workshop

Artificial Intelligence (AI) Coding for Testers Testing Tools

Virtual Pass session
Tue, Nov 17 • 3:45 p.m. – 4:30 p.m.
Room F2 - Track 2: Talks

25-minute Talk

Artificial Intelligence (AI) Other Testability

Virtual Pass session
Tue, Nov 17 • 3:45 p.m. – 4:30 p.m.
Room F3 - Track 3: Talks

25-minute Talk

Artificial Intelligence (AI) Career Development Quality Coaching