A quick post, written last week as I’m relaxing in the mountains.
This month the topic is lessons learned the hard way, from Raul Gonzalez. It’s interesting, and certainly one where I have experience. I make mistakes regularly, though I tend to fix them.
You can read more about T-SQL Tuesday and learn how to participate, or even respond to old prompts at tsqltuesday.com.
A Hard Lesson in Preparation
I worked for a financial services company at one point, managing the live systems as well as performing database development. I had a few people working for me, some in each area, and we had contracts with a number of clients. Quite a few of these contracts specified a yearly DR test at a remote facility.
The first year I was there, we packed up tapes and documentation, carpooled for 4 or 5 of us as if this were a disaster, and started rebuilding systems. Our contractor supplied xx servers as per our contract, and we had to start from scratch.
We didn’t do well, and couldn’t get our web app or client server apps running completely. The db restored, but the system and docs were out of date. I knew this, and we updated some things, but we didn’t take the process seriously.
A few months later our largest client was unhappy. Since we couldn’t get the website running, they wanted another test. At their site.
Myself and another were chosen to fly to their site, shipping out spare servers and software. We had to wipe the drives and rebuild the system from scratch. Since the two of us had been at the DR test, we assumed we’d learned enough to do this in two days, sure we could rectify all the mistakes from the test team.
We couldn’t. After two days, we didn’t have everything running, with too many dependencies and issues from various web components. Our developers had been running wild, with admin access to the webserver and database server, resulting in wonderful unique, snowflake servers that were hard to reproduce.
Inadequate preparation, as well as poor security for years, contributed to the failure. I did get the database working, but that didn’t matter since this client used web access only. I had to take a large portion of the blame since I managed systems, and our company paid a penalty.
It was embarrassing for me, personally and professionally. I hadn’t done a good enough job being ready for a disaster. This did server to motivate me to start cleaning up the systems and ensuring we could rebuild a new web server if needed, but it was a hard, and expensive lesson, to learn.