As organisations around the world continue to recover from what some have described as the biggest IT outage in history, the CrowdStrike software glitch serves as a wake-up call to keep businesses secure against unforeseen IT failures that could potentially bring services across the globe to a grinding halt.
It is estimated that 8.5 million Windows devices¹ across 674,620 direct customers in 1,200 unique industries were affected² due to a flaw in a routine update issued for a piece of cyber software.
It was not a cyberattack or breach. However, the outage has triggered warnings from cybersecurity experts about a surge in hacking attempts exploiting the IT disruption.
The disruption on 19 July 2024 pales in comparison to the WannaCry virus in 2017 that infected around 230,000 computers across 150 countries before a kill switch was identified.
The widespread impact of the global IT outage was quite alarming for those directly affected. People were not able to withdraw money from bank accounts, supermarkets were forced to close, airline fleets were grounded, and congestion built up at major ports across the world.
Global IT outage exposes critical fault lines
The outage brings organisations like major software vendors and IT infrastructure providers into the realm of critical infrastructure, underscoring their importance to our daily lives as well as their broad socio-economic importance. It also brings into focus the question of trust. Just as people turn on the tap in their homes to get clean water that they don’t need to test before consuming, they turn on their computers with the same level of trust not expecting to get a “blue screen of death” as a result of a routine update from a trusted provider.
There is a significant element of concentration risk at play. A vast majority of the world's IT systems run on a handful of providers. Should any one of them experience an outage, the results could be catastrophic, extending far beyond mere inconvenience. Such an event could compromise public health and safety, and even put lives at risk. In this light, the recent global IT outage might seem relatively minor.
The outage has brought the issue of quality control in software updates into the spotlight, drawing attention to the urgent need for more rigorous scrutiny during the testing phase before deployment. It raises the question of whether fundamental changes are necessary in the operations of essential technology service providers. For instance, the question that arises is should new quality assurance protocols be implemented to govern the rollout of updates and new software releases?
How can risks be minimised?
One way to reduce concentration risk is to diversify. But the interconnectedness of the technology provider ecosystem means that this may not be very practical.
The question of trust will arise for many of the organisations affected by the recent outage. At least some of them may be considering switching provider. This is not necessarily a wise course of action though. It would risk further disruption with no guarantee that the new solution will be as effective. The fact remains that the likely cause of the outage was human error, and this does happen from time to time, even in the very best organisations.
This puts the focus back on the affected organisations.