A few weeks after I started with an insurance company as a rookie COBOL developer, Hurricane Katrina made landfall on the Gulf Coast and New Orleans. Because I worked for an insurance company, the emphasis on helping families and businesses recover from disaster was paramount—it was our business.
Later that year, our company held one of our first Disaster Recovery (DR) exercises. It was during that exercise, where I worked long hours living off Diet Dr. Pepper and Sam’s Club snacks, that I found my desire to help my business recover from a disaster. I have been focused on DR ever since.
Recovering from Natural Disasters
Originally thinking that we would never actually have to use our plan, sixteen months later our community was hit with a debilitating ice storm, knocking out power to our data center for three days. Although the organization did not declare a disaster and use our DR partner facility, we utilized a significant portion of the plan we developed over fifteen months. Our practice paid off and we were able to recover much faster than many in the company thought we would. Again, in 2008, Hurricane Ike hit a data center that my parent company was using, knocking out power and flooding the home office building. Luckily, we did not have to invoke our DR plan.
Technology Disasters
Natural disasters are not the only types of disasters that can affect a data center. Several years ago, an organization I worked for had a SAN that had a few disks go bad, as SAN arrays tend to. We called our SAN vendor to come out and replace the drives. The new drives were swapped and the vendor tech had our SAN admin initialize the new drives. Unfortunately, the vendor tech provided the wrong command and our SAN admin formatted the header of our ENTIRE ARRAY at 15:00 on a Friday. We had all of our logs for that day on the array that was just formatted. That was a bad day for everyone.
That incident forced us to make changes to our DR plan. Many more of these types of issues have happened during my career, ranging from a developer’s deletes going bad, to a DBA dropping a table or database (I HAVE NEVER DONE THAT), to databases coming back suspect after patches.
Disaster Recovery Options
Through all the different types of disasters I’ve encountered in my career, I have learned the importance of having multiple DR options up your sleeve depending on the service-level agreement (SLA), Recovery Point Objective (RPO), and Recovery Time Objective (RTO). In future blog posts, I will share examples of the ways in which I have created DR plans with all the technologies that I have to my disposal. I will start by covering good old backup and restore.
I encourage you to share feedback—where I am right and, more importantly, where I am wrong. Stay tuned for more about disaster recovery and how to do it.