Skip to main content

Your Chatbot Is a Parrot - Let’s Make It Behave

25-minute Talk

You can’t test LLM-powered chatbots like traditional software because they behave differently, that requires redefining quality

Virtual Pass session

Timetable

11:45 a.m. – 12:30 p.m. Thursday 27th

Room

Room F1 - Track 1: Talks

Artificial Intelligence (AI) Testability Testing Tools

Audience

Everyone interested in testing AI

Key-Learnings

  • How to define what quality means for your chatbot
  • How to design a practical testing strategy for LLM-based chatbots
  • How to find the right balance between automated evaluation and human review

As a QA engineer, trends hit differently. While everyone is excited about AI taking over the roadmap, your mind goes: how am I going to test this?

Testing an LLM-powered chatbot is like trying to train a parrot raised in a home library. He's read so many books, can give a million answers but you never really know what to expect (I mean.. it’s just a parrot!). Then one day, you're expected to let him fly out the window and follow strangers’ directions. How are you supposed to feel confident about that?

Forget whether strangers even want to talk to a parrot. Right now, your job is to make sure it gives correct answers, doesn’t spill secrets and generally behaves like a decent bird. 

But how do you measure decency? How do you evaluate responses when your chatbot can give five different (and still reasonable) answers to the same question? Being technically correct is not enough anymore. Now you have to ask:

  • Is it personalized?
  • Is it relevant? 
  • Is it aligned with our product? 

And then there are all the fun edge cases to worry about:

  • What if a user asks to add 500 pig emojis to every output - fun, sure, but how many tokens will it cost us?
  • What if the chatbot starts leaking private data? We don’t want the parrot to end up in jail!

The challenge is defining quality in your context and finding the sweet spot between automation and human review. I’ll be happy to share how we approached testing, what worked (and what didn’t) and what it took to train our parrot to behave in the real world 🦜

Related Sessions

Virtual Pass session
Wed, Nov 26 • 5:00 p.m. – 6:00 p.m.
Room F1+F2+F3 - Plenary

45-minute Keynote

Artificial Intelligence (AI) Test Automation

Virtual Pass session
Tue, Nov 25 • 10:45 a.m. – 11:30 a.m.
Room F3 - Track 3: Talks

25-minute Talk

Artificial Intelligence (AI) DevOps Other

Virtual Pass session
Thu, Nov 27 • 4:00 p.m. – 4:45 p.m.
Room F2 - Track 2: Talks

25-minute Talk

Collaboration & Communication Testability Test Management

Virtual Pass session
Tue, Nov 25 • 11:45 a.m. – 12:30 p.m.
Room E1 - Track 4: Talks

25-minute Talk

Agile Methodologies Other Testing Tools