I’ve had this post by Wayne Hale in my queue for awhile since I’ve wanted to comment on it for awhile and until lately have been to busy to do so.
One of my current contracts requires them to do an annual DR test. Since the end of the year is approaching, they’re trying to get the test in. Part of the test requires an “official” test observed by an outside auditor.
So, being smart, and since a lot has changed in the past year, we decide to schedule a dry-run or two before hand.
Well let’s just say those have not gone as expected.
Some might consider the dry-runs failures.
I don’t. I consider them successes. We are finding out now, in a controlled environment with no real time pressures, where we are weak and need to fix things.
It’s far better to do this now than during the audited test or even better than during an actual disaster event! So the dry-runs are serving their purpose, they’re helping us find the holes before it’s too late.
That said, I have to claim the part that I’m most involved with, the SQL Log-Shipping has been working well. The only issue this week with that was a human error made by another DBA that was completely unrelated to the DR test and within minutes of him discovering his error he executed the proper procedure to begin fixing it. The total fix on his end took no more than 5 minutes and other than monitoring on my end, the effort on my end took no more than 5 minutes. That’s an excellent example of a robust design and set of procedures.
Today’s moral is don’t just have a DR plan, practice it. And not every failure is really a failure.