Scoring Evaluations of emergent behavior and bias
GenAI is introducing new challenges in many traditional roles in software engineering, but the role of the software tester has never had a brighter future. The very same skills that allowed you to investigate further and apply statistical patterns to search for bugs that have not even presented themselves yet... these skills are desperately needed. But not in the way you approached this in the past.
In this workshop, we will engage in a rival red team session, and build up skills defining LLM and guardrail eval techniques, labeling and scoring criteria... and discover who to begin performing hands on testing for your company projects even before a technical design has been defined.
We will also get more technical exploring how to help design methods to implement LLM-as-a-judge detection systems for agentic AI systems that comply with AI Governance standards.
Prepare to be challenged... prepare to compete, and come out with a fresh perspective on how you can apply your existing testing skills to a truly complex problem space that desperately need you.