Skip to main content

Testing the Untestable: Building a strategy for testing AI

25-minute Talk

AI Agents output change every time. "Expected Results" do not exist. Traditional automation is brittle. Manual testing is too slow. Results are subjective! How on earth do you move quickly?

Virtual Pass session

Timetable

10:45 a.m. – 11:30 a.m. Tuesday 17th

Room

Room F1 - Track 1: Talks

Artificial Intelligence (AI)

Audience

Tester, Manager,

Key-Learnings

  • The Input Shift: Moving from static test cases to Automated Persona-Driven Testing, using AI to simulate thousands of diverse user interactions.
  • The Verification Shift: Replacing binary assertions with Semantic Evaluation.
  • The Safety Shift: Why automation isn't enough. Discuss the role of Human-in-the-Loop review and the unique value it adds to high-risk scenarios.

For decades, testing has relied on a simple truth: If I provide Input X, I should get Output Y. Generative AI broke that truth.

For decades, testing has relied on a simple truth: If I provide Input X, I should get Output Y.

Generative AI broke that truth.

When you build AI Agents, the output changes every time. "Expected Results" do not exist in the same way. Traditional automation is brittle. Manual testing is too slow. And to make matters worse, results are subjective!


So, how do you assure quality in a system that, by its very nature, is unpredictable?


This talk outlines the strategic concepts required to tame the chaos of testing AI by exploring the fundamental shifts in the Quality Lifecycle:

  • The Input Shift: Moving from static test cases to Automated Persona-Driven Testing, using AI to simulate thousands of diverse user interactions 
  • The Verification Shift: Replacing binary assertions with Semantic Evaluation
  • The Baseline Shift: How to establish a "Quality Baseline" for your product. 
  • The Safety Shift: Why automation isn't enough. We will discuss the role of Human-in-the-Loop (HITL) review and the unique value it adds to high-risk scenarios.
  • The Observability Shift: We discuss the importance of detailed tracing to understand not just what the model said, but why it decided to say it.
     

 

Related Sessions

There are currently no related sessions listed. Please check back once the program is officially released.