In cyber security we have a perception that we need to hold ourselves to a higher level of scrutiny than others. We are expected to be the gold standard – a whole level of perfectionism that is unattainable. So, what happens when a cyber security company falls foul to a simple mistake?
CrowdStrike can be considered a case study of this. I remember reading the technical readout and it was as ‘simple’ as adding one extra field to a template and that’s what crashed it all. However, I expect this was not a simple case of a test case failing, it was probably a series of events that resulted in a significant global issue. Sometimes this is called the Swiss cheese model where a set of faults, or tests fail, and all the holes in the cheese line up allowing an event to occur.
But we must accept that it did happen, and this is because we can never truly eliminate risk in technology – the sooner we change our perception of this, the sooner we can be prepared to handle future incidents effectively, or importantly understand the risks involved however improbable they may be.
Acknowledge the systemic nature of risks
The CrowdStrike outage really highlighted the question – have we become too reliant on technology companies that are all critically dependent on each other in one big system?
The reason why we use all these centralised cloud and SaaS providers is that the benefits often outweigh the risks. But if one of these large providers experiences an incident, it could have widespread impact across many organisations that rely on their services.
This can create a “too big to fail” dynamic, like the financial sector, where the failure of a major player could have cascading effects.
I’ve found that, in general, people are good at understanding risk that is personal to them. We all know that crossing a busy road at rush hour is risky, but we mitigate that risk by using designated crossing areas. But, as humans, we are bad at understanding the big systemic problems that we are facing in the same way, and that we’re potentially overloading all this risk onto a handful of organisations. Is it time to start diversifying our technology stacks and not putting all the eggs in one basket?
Zero risk is not achievable
Let’s be honest with ourselves! As much as you would like to think you can eliminate all risks, we can’t.
We need to be realistic about risk, otherwise organisations will spend infinite money and time mitigating risk on security controls, and that’s not practical or pragmatic. If you end up coding until the cows come home, nothing will be released.
The focus should be on reducing risk to a reasonable, manageable level, rather than striving for absolute zero risk. There will always be some level that needs to be managed. I worked in the UK rail sector and there was a concept called As Low as Reasonably Practicable. I use this approach today and it has served me well.
Be transparent about residual risks
Being upfront about the fact that some risks will remain, even after mitigation efforts, is important for setting realistic expectations with stakeholders and senior managers.
Don’t try to pull the wool over anyone’s eyes and say that your organisation’s risk will be zero – you need to be transparent with your stakeholders about the performance of the function or what you’re working with. You can’t sit there and say that everything is fine when it isn’t and give someone a bad surprise if things go wrong. Transparency isn’t just important if you have an incident – in many cases it’s even more important in the prevention of the incident itself.
Personally, I feel CrowdStrike did as much as they could to respond well to the incident. They were open and honest, communicated clearly with customers and stakeholders, and put a lot of resources and effort into PR, relationship management and crucially technical help. You can see this on the constant updates and remediation advice posted online. But no matter what an organisation does, it can never be truly eliminate risk in its systems and promise this to the world.
The key is finding the right balance. Keeping security measures and incident response simple and easy to implement is crucial, otherwise they are likely to be neglected. And, at the same time, organisations need to be transparent enough to maintain trust, manage risks to an acceptable level, and implement practical solutions that can be consistently followed.