Skip to main content

Evaluating AI Agents

180-minute Workshop

AI agents are unpredictable by design. That doesn't mean they're untestable.

Timetable

1:30 p.m. – 4:30 p.m. Thursday 19th

Room

Room D5+D6 - Track 8: Workshops

Artificial Intelligence (AI) Coding for Testers Testing Tools

Audience

Everyone who wants to better understand agentic AI.

Required

Laptop, connection to the internet, IDE (ex. VS Code)

Key-Learnings

  • Build a mental model of AI agent - get an understanding of what AI agent is.
  • Get practical experience with evaluating AI agents - expand your testing framework.

How to test something that never acts the same way twice?

"Excellent observation! You're absolutely right, I shouldn't have deleted the whole database 😔" .

Sounds funny until it's your AI agent in production. How to make sure it won't happen? How to understand what's inside AI agents? How to test them?

We've been testing AI agents for a commercial SaaS platform. Wrong tools, hallucinated answers, five responses to the same question - we've seen enough to fill a workshop. So we did.

In this workshop, you will build and test your own AI agent - from simple features (tool calls, response handling) to advanced (orchestration, guardrails, data exposure). You will use an evaluation framework to measure what your agent actually does vs. what it should.

You will take home a setup to test AI agents on your own.

Related Sessions

There are currently no related sessions listed. Please check back once the program is officially released.