How doing harm to your systems intentionally can improve their resiliency
Year 1955. The first human was about to get the newly invented polio vaccine by Dr. Jonas Salk of US. No one knew how would that human body behave? Will it develop complications, or will it survive the challenge. The patient was convinced numerous times that she would have a small sample of the polio virus injected directly into her bloodstream. She would not get sick. Instead, immunity would develop in her body against this virus and hence she will not get polio now or in the future.
The patient was nervous, after all this was the first experiment. The potential
health risks were weighed against the possible benefits. Finally, she decided to
be vaccinated. And, as the experiment was successful, she became much immune in the long-term compared to those who fearfully refuse the new treatment.
How did it relates to the topic I am proposing? This became a perfect metaphor for a technique that we commonly know today as Chaos Testing. Every software have limits and multiple points of possible failure. By injecting a system with variables that have a likelihood of causing disruption, the vulnerable and weak areas of the system can be exposed by the disaster recovery teams. Necessary step-by-step solutions and protocols can be implemented which will eventually allow the software to become even more resilient and fault tolerant.
Many companies still fail to understand that how will they adapt and modify their existing disaster recovery plans (DRP) to create space for these newly discovered cause-and-effect strategies. In fact, they may be surprised to learn that effective and well-tested chaos engineering principles can essentially streamline the entire DRP exercise process. As a direct result, customer receives better services and business profits grow.
in my presentation, I will begin by explaining how chaos testing is different than regular testing. My next few slides will take us through a well defined Chaos testing strategy that caters small in-house systems to large distributed systems. I will explain how to Chaos test at Unit levels, Integration levels and system levels. This will also help teams to convince upper management about "inserting" a chaos in system and then see "what happens". I will end my presentation with a real life use case so audience can see "Chaos testing" in practicality.