November 10 – 12, 2020

Online Edition!

Hunting for Bad Data

Emanuil Slavov

Working with bad data causes trillions of dollars of losses each year.

Bad data is very context specific. You need to know your domain inside out to define it.

Implement data validity checks before write operations.

Learn how to build a system that will recover from bad data.

Simple algorithms can go a long way and most of the times they do good enough job compared to machine learning.

EUROPE'S GREATEST AGILE SOFTWARE TESTING FESTIVAL!

Hunting for Bad Data

A Practitioner’s Guide to Self Healing Systems

Defects in our data are as important as the defects in our code.

In 2013, the total amount of data in the world was 4.4 zettabytes. In 2020 it is estimated to be 10 times more. With speed advances and miniaturization of computers, it is now easier than even to collect all sorts of data in vast amounts. But quantity does not equate quality. Some of the most expensive software defects were caused by handing an incorrect data. The most recent example is the crash of the Schiaparelli Mars Lander.

Back to Earth, at Falcon.io, we investigate every exception that is generated from our production environment. We were surprised to find out that 19% are caused by bad data. This includes missing data, data with the wrong type, truncated data, duplicated data, inconsistent format etc. As our core business is to collect and process and analyze data from the biggest social networks, this finding was a major wakeup call.

“Jidoka” is a term that comes from lean manufacturing. It means to give the machines “the ability to detect when an abnormal condition has occurred”. We wanted to go one step further and create a simple, yet robust system ability to not only recognize bad data but also to fix it automatically so that it can heal itself. As it turns out, in some cases, it’s faster/easier to fix bad data than to locate the code that produced it. This talk is a practical one, containing ideas and tips that you can quickly implement when dealing with data quality.


More Related Sessions


30-min New Voice Talk

15:10-15:40 Room F1 - Track 1: Talks

Little Red Riding Hood and the forest of broken windows

30-minute Talk

10:25-10:55 Room F3 - Track 3: Talks

How To Explain Exploratory Testing in 10 Minutes

30-minute Talk

16:10-16:40 Room F3 - Track 3: Talks

5 levels of api automation

30-min New Voice Talk

10:25-10:55 Room F1 - Track 1: Talks

Testing on Production, deep backend edition

If you like AgileTD you might also be interested in :

Your privacy matters

We use cookies to understand how you use our site and to give you the best experience on our website. If you continue to use this site we will assume that you are happy with it and accept our use of cookies, Privacy Policy and Terms of Use.