Skip to main content

How Quality Fails in Complex Software Systems

25-minute Talk

A small defect caused global disruption, but the bug alone does not explain the scale. This talk shows how quality emerges from the whole socio-technical system, not just the code.

Virtual Pass session

Timetable

3:45 p.m. – 4:30 p.m. Tuesday 17th

Room

Room E1 - Track 4: Talks

DevOps Leadership Testability

Audience

Testers, developers, quality engineers, engineering leaders, and anyone shaping software delivery

Key-Learnings

  • Use micro and macro lenses to understand why small technical failures can become large-scale organisational incidents.
  • Learn practical ways to reduce blast radius through safer validation, broader test coverage, staged rollout, and stronger recovery paths.
  • Move beyond root cause and blame to see how quality is shaped by systems, incentives, dependencies, and operational readiness.

Using the CrowdStrike outage as a case study, this talk explores how quality is created, maintained, and lost across code, release practices, organisational decisions, and wider ecosystems.

On 19 July 2024, what first looked like isolated IT issues at airports quickly became a global outage affecting hospitals, banks, broadcasters, supermarkets, and more. The trigger was small. The impact was enormous.

In this talk, I use the CrowdStrike outage to explore how quality is created, maintained, and lost in complex software systems.

First, I walk through what happened and why recovery was so difficult at scale. This was not just an application crash. Because the Falcon sensor ran at the kernel level, affected Windows machines could fail before normal recovery was possible, turning a bad update into a slow, manual repair effort.

From there, I examine the incident through two lenses.

The micro view looks at the engineering and release decisions that allowed a simple defect to escape, including validation choices, limited negative and regression testing, and release practices that increased blast radius.

The macro view looks at why the consequences were so widespread, including market concentration, ecosystem dependencies, customer rollout control, operational readiness, and the limits of single root-cause thinking.

Attendees will leave with practical ideas for safer change, stronger resilience, and a clearer understanding of how quality engineering helps teams improve the system in which quality emerges.

Related Sessions

There are currently no related sessions listed. Please check back once the program is officially released.