Letting AI Decide Testing Chronology

25-minute Talk

Letting an AI predict software test results allows us to save time in several ways - at commit time, when assigning defects, and at test-suite execution time.


10:45 a.m. – 11:30 a.m. Wednesday 17th


Room F3 - Track 3: Talks


Tester, Developers, Product Owners, ...


  • Predicting test results leads to quicker feedback for developers
  • The underlying data can be used to speed up "defect ping-pong"
  • The predictions can be used to find failing test cases quicker

A Tale of Test Cases, Defects, and Machine Learning

Sometimes, tests are running for a long time. Waiting for minutes, hours, or even days for a test result may negatively impact developers. This is why we built a machine learning system that is able to predict the results of test cases at commit time, using metadata about the code changes a developer just made, without actually executing the test.

Our system “Scryer” is able to predict test results with a mean accuracy of 78%. This enables reordering tasks in a manner that helps us save time in several ways: Quicker feedback for developers (“These tests seem likely to fail if run - please double-check your code changes with these test cases in mind”) reduces the time developers spend on re-familiarizing themselves with their own code after waiting for test executions for a long time - either by manually running the tests most likely to fail or by directly using the predictions as one would actual test results. Sometimes, when new defects are discovered in software, it is not clear which team should be the one to fix the defect.

The data we collected to predict test results also encompasses information about software defects and their relation to test cases. This enables us to construct a measure for the most likely team to fix a defect. Thus, time spent on passing along defects until they are in the right hands is reduced. At the time of actual test execution, we can optimize long-running test suites to make best use of available time by prioritizing critical test cases.

Taking into account both the average execution times of tests and their likelihood of failure, test suites can be reordered in several ways, for example by moving test cases with shorter runtimes and high likelihood of failure to an earlier rank in the order. This is especially useful if the goal is to find defects as quickly as possible, e. g. in the case of pre-merge tests. In our talk, we describe these use cases from a domain perspective. We then describe our respective approaches of tackling them using data and machine learning, and present our results on how well these methods perform in real-life situations.

The talk is delivered in a 'tag-team' manner: As a software- and test architect at Siemens Healthineers, Marco is well versed in the pain points of large-scale software projects. He is an expert on the domain, its emergent data, and its use cases. As a data scientist at codemanufaktur, Gregor is happiest when thrown into a data lake and told to sink or swim. His expertise is in finding new algorithmic approaches to difficult problems and evaluating the solutions with scientific rigor. Thus, the talk alternates between a theoretical and a practical perspective.

Related Sessions

11:45 a.m. – 12:30 p.m.
Room F2 - Track 2: Talks

25-minute Talk

9:15 a.m. – 10:15 a.m.
Room F1+F2+F3 - Plenary

45-minute Keynote