A Tale of Test Cases, Defects, and Machine Learning

Sometimes, tests are running for a long time. Waiting for minutes, hours, or even days for a test result may negatively impact developers. This is why we built a machine learning system that is able to predict the results of test cases at commit time, using metadata about the code changes a developer just made, without actually executing the test.

Our system “Scryer” is able to predict test results with a mean accuracy of 78%. This enables reordering tasks in a manner that helps us save time in several ways: Quicker feedback for developers (“These tests seem likely to fail if run - please double-check your code changes with these test cases in mind”) reduces the time developers spend on re-familiarizing themselves with their own code after waiting for test executions for a long time - either by manually running the tests most likely to fail or by directly using the predictions as one would actual test results. Sometimes, when new defects are discovered in software, it is not clear which team should be the one to fix the defect.

The data we collected to predict test results also encompasses information about software defects and their relation to test cases. This enables us to construct a measure for the most likely team to fix a defect. Thus, time spent on passing along defects until they are in the right hands is reduced. At the time of actual test execution, we can optimize long-running test suites to make best use of available time by prioritizing critical test cases.

Taking into account both the average execution times of tests and their likelihood of failure, test suites can be reordered in several ways, for example by moving test cases with shorter runtimes and high likelihood of failure to an earlier rank in the order. This is especially useful if the goal is to find defects as quickly as possible, e. g. in the case of pre-merge tests. In our talk, we describe these use cases from a domain perspective. We then describe our respective approaches of tackling them using data and machine learning, and present our results on how well these methods perform in real-life situations.

The talk is delivered in a 'tag-team' manner: As a software- and test architect at Siemens Healthineers, Marco is well versed in the pain points of large-scale software projects. He is an expert on the domain, its emergent data, and its use cases. As a data scientist at codemanufaktur, Gregor is happiest when thrown into a data lake and told to sink or swim. His expertise is in finding new algorithmic approaches to difficult problems and evaluating the solutions with scientific rigor. Thus, the talk alternates between a theoretical and a practical perspective.

Letting AI Decide Testing Chronology

25-minute Talk

Audience

Key-Learnings

A Tale of Test Cases, Defects, and Machine Learning

Related Sessions

Letting AI Decide Testing Chronology

25-minute Talk

Timetable

Room

Audience

Key-Learnings

A Tale of Test Cases, Defects, and Machine Learning

Related Sessions