Disaster Recovery – Focus on the Disaster First!
When organizations first take on the challenge of setting up a disaster recovery plan, it’s almost always based on the premise that a complete failure will occur. With that in mind, we take the approach of planning for a complete recovery. We replicate our services and VMs to some sort of secondary site and go through the processes of documenting how to bring them all up again. While this may be the basis of the technical recovery portion of a DR plan, it’s important to take a step back before jumping right into the assumption of having to recover from a complete failure. Disasters come in all shapes, forms, and sizes, and a great disaster recovery plan will accommodate for as many types of disasters possible. For example, we wouldn’t use the same “runbook” to recover from simple data loss that we would use to recover from the total devastation of a hurricane. This just wouldn’t make sense. So even before beginning the recovery portions of our disaster recovery plans we really should focus on the disaster portion.
As mentioned above, the human mind always seems to jump into planning for the worst-case scenario when hearing the words disaster recovery: a building burning down, flooding, etc. What we fail to plan for is other, minor, less significant disasters, such as temporary loss of power or loss of entrance due to quarantine. So, with that said, let’s begin to classify disasters. For the most part, we can lump a disaster into two main categories:
Natural Disasters – These are the most recognized types of disasters. Think of events such as a hurricane, flooding, fire, earthquake, lightning, water damage, etc. When planning for a natural disaster, we can normally go under the assumption that we will be performing a complete recovery or avoidance scenario to a secondary location.
Man-made Disasters – These are the types of disasters that are lesser known to organizations when looking at DR. Think about things such as temporary loss of power, cyberattacks, ransomware, protests, etc. While these intentional and unintentional acts are not as commonly approached, a good disaster recovery plan will address some of these as the recovery from them is often much different from that of a natural disaster.
Once we have classified our disaster into one of these two categories, we can then move on by further drilling down on the disasters. Performing a risk and impact assessment of the disaster scenarios themselves is a great next step. Answers to questions like the ones listed below should be considered when performing our risk assessment because it allows us to further classify our disasters, and, in turn, define expectations and appropriate responses accordingly.
- Do we still have access to our main premises?
- Have we lost any data?
- Has any IT function been depleted or lost?
- Do we have loss of skill set?
How these questions are answered as it pertains to a disaster can completely change our recovery scenarios. For example, if we have had a fire in the data center and lost data, we would most likely be failing over to another building in a designated amount of time. However, if we had also lost employees, more specifically IT employees in that fire, as well, then the time to recover will certainly be extended as we most likely would have lost skill sets and talent to execute the disaster recovery plan. Another great example comes in the form of ransomware. While we still would have physical access to our main premises, the data loss scenario could be much greater due to widespread encryption form the ransomware itself. If our backups were not air-gapped or separate from our infrastructure, then we may also have encrypted backups, meaning we have lost an IT function, thus provoking a possible failover scenario even with physical access to the building. On the flip side, our risks may not even be technical in nature. What is the impact of losing physical access to our building in the result of protests or chemical spills? Some disasters like this may not even require a recovery process at all, but still pose a threat due to the loss of access to the hardware.
Disaster recovery is a major undertaking, no matter what size the company or IT infrastructure, and can take copious amounts of time and resources to get it off the ground. With that said, don’t make the mistake of only planning for those big natural disasters. While it may be a great starting point, it’s best to really list out some of the more common, more probable types of disasters as well, document the risks and recovery steps in turn. In the end, you are more likely to be battling cyberattacks, power loss, and data corruption then you are to be fighting off a hurricane. The key takeaway is – classify many different disaster types, document them, and in the end, you will have a more robust, more holistic plan you can use when the time comes.