You wrote a test. It passed. You ran it again. It failed. And you ran it again, IT passed!
Testing GenAI systems means accepting that the same input produces different outputs, errors don't throw exceptions, and a response that looks correct can be entirely fabricated. Every instinct we have built as testers can work against us here if we don't grow beyond them.
Throughout this workshop we follow one analogy: The Restaurant. Marco is a chef who has read every recipe ever written, extraordinary, but with no memory between shifts. Sofia fetches recipes before every service — get her retrieval wrong and Marco cooks from the wrong recipe. Warning: even if Sofia gets it right, Marco might still go off-script. Each character maps to a technical layer you will test: LLM, retrieval, input guardrails, output guardrails, evaluation.
Are you already invested in this story?
Then join me. We open with a running chatbot, five bugs, no instructions. A hallucination. A guardrail bypass. A retrieval failure. A prompt injection. A bias.
Together we build a one-page GenAI Test Strategy Canvas, one layer at a time. You walk out with seven pipeline layers, one failure mode and one test pattern each, mapped to a quality framework you can apply to any GenAI system on Monday.