It’s not the first task when I start a new job, but often as a DBA or developer, I usually ask about Disaster Recovery (DR) plans sometime within the first six months. If I’m a DBA, of course I need a plan. If I’m a developer, however, I still need to understand how this might work as it can affect how I build the software and prepare for networking, machine changes, etc. Even if I don’t concern myself with production DR, I usually do want to make sure the VCS repos are being protected, which is something I’ve found isn’t always being handled.
I have had to build and test DR plans as a member of an Operations team in the past. While my plans and practice are nothing like Google’s large exercises, they often reveal some issues, even when we duplicate the service without touching production. I’ve usually found the brainstorming and debating of the various ways to build a plan to be fun. Arguing for money and then actually implementing plans is less interesting, but the testing is a great challenge. I’ve had some fun days offsite where we try to recover systems and find all the little things that we take for granted in our production environment.
These days there are companies offering DR as a Service (DRaaS), which is an interesting concept. I found an article from Michael Otey that talks about the features you might want to look for if you contract with a vendor. In the past, I would never consider this, but the more we advance in the world with cloud infrastructures and even full service co-location vendors, the more I think DRaaS makes sense.
I wouldn’t necessarily take anyone’s word that their service meets my needs, so thinking about the requirements, and then working through a few PoCs (proof of concept) is likely very important. We do a lot of PoC work at Redgate to help customers evaluate whether Compliant Database DevOps is a good fit. I think this is important for software development, but even more important for DR plans. After all, downtime is expensive, and the last thing you want to find out when troubles arise is that some critical piece of infrastructure can’t be easily duplicated.
I’ve used DR companies in the past, with their own physical facilities. They have impressive capabilities and marketing, but the mixing of their skills with my systems has often been rocky and lead to changes in our plans, contracts for new or fewer services, and often updated documentation for junior staff. After all, I usually expect DR situations to occur when I’m on vacation, so I plan for that.
You never know when you’ll need to execute DR plans. It pays to think about this ahead of time and periodically test yourself and your staff.