Skip to main content

How Quality Fails in Complex Software Systems

25-minute Talk

A small defect caused global disruption, but the bug alone does not explain the scale. This talk shows how quality emerges from the whole socio-technical system, not just the code.

Virtual Pass session

Timetable

3:45 p.m. – 4:30 p.m. Tuesday 17th

Room

Room E1 - Track 4: Talks

DevOps Leadership Testability

Audience

Testers, developers, quality engineers, engineering leaders, and anyone shaping software delivery

Key-Learnings

  • Use micro and macro lenses to understand why small technical failures can become large-scale organisational incidents.
  • Learn practical ways to reduce blast radius through safer validation, broader test coverage, staged rollout, and stronger recovery paths.
  • Move beyond root cause and blame to see how quality is shaped by systems, incentives, dependencies, and operational readiness.

Using the CrowdStrike outage as a case study, this talk explores how quality is created, maintained, and lost across code, release practices, organisational decisions, and wider ecosystems.

On 19 July 2024, what first looked like isolated IT issues at airports quickly became a global outage affecting hospitals, banks, broadcasters, supermarkets, and more. The trigger was small. The impact was enormous.

In this talk, I use the CrowdStrike outage to explore how quality is created, maintained, and lost in complex software systems.

First, I walk through what happened and why recovery was so difficult at scale. This was not just an application crash. Because the Falcon sensor ran at the kernel level, affected Windows machines could fail before normal recovery was possible, turning a bad update into a slow, manual repair effort.

From there, I examine the incident through two lenses.

The micro view looks at the engineering and release decisions that allowed a simple defect to escape, including validation choices, limited negative and regression testing, and release practices that increased blast radius.

The macro view looks at why the consequences were so widespread, including market concentration, ecosystem dependencies, customer rollout control, operational readiness, and the limits of single root-cause thinking.

Attendees will leave with practical ideas for safer change, stronger resilience, and a clearer understanding of how quality engineering helps teams improve the system in which quality emerges.

Related Sessions

Virtual Pass session
Tue, Nov 17 • 2:30 p.m. – 3:15 p.m.
Room E1 - Track 4: Talks

25-minute Talk

Leadership Mental Health & Self Care

Virtual Pass session
Wed, Nov 18 • 7:00 p.m. – 8:00 p.m.
Room F1+F2+F3 - Plenary

45-minute Keynote

Artificial Intelligence (AI) Ethics in Tech Leadership

Virtual Pass session
Thu, Nov 19 • 3:45 p.m. – 4:30 p.m.
Room F3 - Track 3: Talks

25-minute Talk

Career Development Collaboration & Communication Leadership

Mon, Nov 16 • 8:30 a.m. – 4:30 p.m.
F-,E- & D-Rooms

6-hour Masterclass

Other Testability