Skip to main content

What I Learned Building Two Testing Platforms with Agents

25-minute Talk

Building with AI agents transforms you from coder to conductor. But success breeds dangerous overconfidence. Here's what experience taught me about orchestrating swarms.

Virtual Pass session

Timetable

4:00 p.m. – 4:45 p.m. Wednesday 26th

Room

Room E1 - Track 4: Talks

Artificial Intelligence (AI) Test Automation Testing Tools

Audience

QE leads, developers building AI tools, architects designing agent systems, skeptics of AI hype.

Key-Learnings

  • Orchestration patterns from production: Memory Banks over models, specialized agents over generalists, human judgment for WHAT/WHY, AI for HOW/SCALE.
  • Five hard lessons: Perfect kills good, AI exaggerates severity, type safety ≠ better code, assess impact radius, and validate incrementally.
  • When to ignore AI: frameworks for validating severity claims, protocols for architectural changes, and patterns for rejecting bad suggestions.

Real lessons from building two production testing platforms with AI agent swarms - where I learned when to trust AI and when to trust working code.

When a colleague asked if we could automatically generate API tests from specifications, I challenged AI agents to build the solution. Two months later, I had two production testing platforms: Sentinel for API testing and Agentic QE Fleet for multi-agent orchestration - both built entirely with AI agents and agent swarms, and now open-source.

My approach evolved through three phases: wrestling with single agents while discovering Memory Banks matter more than model selection, co-piloting with agents that felt like junior developers with persistent context, and finally conducting parallel swarms with specialized agents working simultaneously. The breakthrough came when I stopped coding and started orchestrating.

But success bred dangerous overconfidence. While "polishing" Agentic QE Fleet's perfectly working build, I let a QE swarm convince me we had critical problems. Four hours later: 54 TypeScript errors from "improvements" that broke everything. The agents screamed, "207 ESLint errors! P0 Critical!" Reality? Three actual errors, 204 warnings, and a successful build they'd destroyed. One type definition cascaded through 48 locations. Five minutes to restore once I asked the right question.

I'll share orchestration patterns that work in production, frameworks for validating AI severity claims, and protocols for architectural changes with AI assistance. Most critically, you'll see when to trust AI and when to trust working code.

Related Sessions

Deep Dive session
Wed, Nov 26 • 11:45 a.m. – 12:30 p.m.
Room D1+D2 - Track 6: Test Automation Deep Dive

25-minute Talk

Test Automation

Deep Dive session
Wed, Nov 26 • 11:45 a.m. – 12:30 p.m.
Room D5 - Track 8: Security Testing Deep Dive

25-minute Talk

Continuous Integration/Continuous Delivery (CI/CD) Security Testing Test Automation

Mon, Nov 24 • 8:30 a.m. – 4:30 p.m.
F-,E- & D-Rooms

Full-Day Tutorial (6 hours)

Artificial Intelligence (AI)

Mon, Nov 24 • 8:30 a.m. – 4:30 p.m.
F-,E- & D-Rooms

Full-Day Tutorial (6 hours)

Artificial Intelligence (AI) Other