Testing

This ties in with the concept of experimentation. Thomas Grohser related a story the other night of a case of “yeah, the database failed and we tried to do a restore and found out we couldn’t.”

Apparently their system could somehow make backups, but couldn’t restore them. BIG OOPS.  (Apparently they managed to create an empty database and replay 4.5  years of transaction logs and recover their data. That’s impressive in its own right.)

This is not the first time I’ve worked with a client or heard of a company where their disaster recovery plans didn’t pass the first actual need of it. It may sound obvious, but companies need to test the DR plans. I’m in fact working with a partner on a new business to help companies think about their DR plans. Note, we’re NOT writing or creating DR plans for companies, we’re going to focus on how companies go about actually implementing and testing their DR plans.

Fortunately, right now I’m working with a client that had an uncommon use case. They wanted a restore of the previous night’s backup to a different server every day.

They also wanted to log-ship the database in question to another location.

This wasn’t hard to implement.

But what is very nice about this setup is, every 15 minutes we have a built-in automatic test of their log-backups.  If for a reason log-backups stop working or a log gets corrupt, we’ll know in fairly short time.

And, with the database copy, we’ll know within a day if their backups fail.  They’re in a position where they’ll never find out 4.5 years later that their backups don’t work.

This client’s DR plan needs a lot of work, they actually have nothing formal written down. However, they know for a fact their data is safe. This is a huge improvement over companies that have a DR plan, but have no idea if their idea is safe.

Morale of the story: I’d rather know my data is safe and my DR plan needs work than have a DR plan but not have safe data.

Who’s Flying the Plane

I mentioned in an earlier post my interest in plane crashes. I had been toying with a presentation based on this concept for quite awhile.

A little over a month ago, at the local SQL Server User group here in Albany I offered to present for the February meeting. I gave them a choice of topics: A talk on Entity Framework and how its defaults can be bad for performance and a talk on plane crashes and what IT can learn from them.  They chose the latter. I guess plane crashes are more exciting than a dry talk on EF.

In any event, the core of the presentation is based on the two plane crashes mentioned in the earlier post, Eastern Airlines Flight 401, the L-1011 crash in Florida in 1972 and US Airways Flight 1549, the Miracle on the Hudson in 2009.

I don’t want to reproduce the entire talk here (in part because I’m hoping to present it elsewhere) but I want to highlight one slide:

Flight 401 vs 1549

  • Flight 401 – Perfectly good aircraft
  • Flight 1549 –About as bad as it gets
  • Flight 401 – 101 Fatalities/75 Survivors
  • Flight 1549 – 0 Fatalities

Flight 401 had a bad front nosegear landing light and crashed.

Flight 1549 had two non-functional engines and everyone got off safely.

The difference, was good communications, planning, and a focus at all times on who was actually flying the airplane.

This about this the next time you’re in a crisis.  Are you communicating well? How is your planning, and is someone actually focused on making sure things don’t get worse because you’re focusing on the wrong problem.  I touch upon that here when I talk about driving.

The moral: always make sure someone is “flying the plane”.

Post hoc ergo propter hoc

One of my favorite shows is The West Wing and there is an episode of the same name as this post. Unfortunately for you, Aaron Sorkin is a better writer than I.

That said, this concept, “After it, therefore because of it” is a common mistake many of us make when forming theories. It’s related to the concept that correlation is not causation.

I was reminded of this the other night when another phrase entered my mind: “Rain Follows The Plow”. This was a hopeful theory in the 19th century that as settlers settled past the 100th Meridian, the rain would follow where they plowed. Simply put, by farming the land, rainfall would increase.

The theory sounds a bit perverse until one considers that for awhile, increased rainfall did seem to increase as the more land came under the plow. So, there was some basis for the idea at first. The correlation seemed to match. However, this just ended up being a short-term climate change.

Unfortunately the theory was also a product of the idea that humans were the center of creation. As the subsequent Dust Bowl and other issues showed however, this theory was, (excuse the bad pun) all wet.

Sometimes correlation is not causation and we should not let our all too human biases influence our theories.

Fortunately, properly done, science is eventually self-correcting. Scientists make mistakes, but over time, the winnowing process eliminates them.  The idea of scientific racism was once extremely popular, but over time has clearly been shown to be false.  The idea of an ether was shown to be false.

Meanwhile, other theories have continued to hold up to intense scrutiny. As weird as quantum mechanics appears to be, evidence continues to mount that much of the current theory is in fact correct. When scientists discover particles that travel faster than light the default assumption continues to be (and so far correctly) that there is an error in the experiment.

Not much of a moral here other than just because the rooster crows when the sun rises, don’t mistake the crowing for the cause of the sunrise.

 

Survivor Bias

I’ve been so busy lately I haven’t had a chance to write anything.

Of course part of the problem isn’t having ideas to write about, but time to write about them.

I think perhaps I should focus more on writing SOMETHING, even if it’s just a short post, than trying to write the Great American Blog post.

In this case I’m going to actually post a link to a great article on survivorship bias.  This is the sort of article I wish I had written myself.  As I’ve mentioned part of my point here is to get one to think about HOW we think.

The story of the bomber survivors was first related to be by a good friend in college, but without a source. Now at least I have a source for it.

In a similar vein, and the article touches upon it, people will talk about how great the older homes in weather prone areas were built because they’re standing decades after they’re built despite hurricanes or floods or blizzards. These folks completely miss the other 90% of homes from those eras that didn’t survive.

Years ago, my father bought and rehabilitated what we believe to have been the oldest house in town (in fact technically it was older than the town and probably where the town charter was signed.)

There really wasn’t anything about the construction that stood out that made it survive. Just luck at this point.  A single fire at any point in time could have made the second oldest house in town the oldest.

In closing, this article doesn’t represent most of my thoughts over the past 6 months only the ones that survived to the publishing stage.

The Hunger Games

I’ll admit it, I’m a sucker for “fantasy” worlds. I don’t necessarily mean fantasy in terms of fairies and elves and goblins, but in the sense of wholly created “worlds” that feel complete.  One of the first I recall reading was the Earth-Sea trilogy which took place on a planet very unlike our own.

Anyway, the latest series I’ve been sucked into, like many is “The Hunger Games”.  For those of you who live under a rock and have missed all the hoopla, it is set in a future dystopia where “Districts” are required to send a male and female “tribute” to the “Capitol” to participate in gladiatorial combat to the death.  The opening scenes remind me much of Shirley Jackson’s “The Lottery.”

In any case, at a couple of points, the story’s hero, Katniss Everden is told by her drunken mentor, Haymitch Abernathy, “Stay Alive”.  On the face of this, when heading into near certain death, this advice from a drunk seems rather pointless.  After all, isn’t everyone in the “Games” trying to Stay Alive?  She at first dismisses him as a drunken fool.

However, as time goes on, despite not being able to communicate directly with him, she starts to understand him better.

And while never fully stated, I believe she finally realizes his advice wasn’t nearly as obvious as it sounds.  It is rather more like a Zen Koan.  By staying alive, she’ll live.

The first time she applies this lesson, without fully realizing it, is right when the games begin.  Unlike many of the tributes who go for the cache of items the Gameskeepers provide at the start of the game, she grabs one or two items and flees for the woods, barely staying alive in the process.  However, she learns that night that 11 of the 24 people who started the game died in the initial moments, most of them trying to grab items from the cache.  They had failed to stay alive.  More importantly, they had failed to follow the advice of “Stay Alive”.  Rather, while planning for the future (“If I can get these weapons now, I can use them later on”) they failed to take into account the present.  In the present there were 23 other tributes intent on killing them.

Soon Katniss realizes that by focusing on “staying alive” she can actually win the games.  She makes some mistakes, but also does many things right.  Once she’s assured that she can stay alive, only then does the actually go on the offensive.  As a result she at the end, she is still alive.

Ultimately, I realized this similar to the point I often try to drill into people when I say, “fly the plane”.  This reflects lessons learned by the NTSB and others that there are airplane crashes that result as a result of the pilot failing to do the most important job, flying the plane.  They may get distracted, or worse focused on the wrong issue and and end up flying the plane into the ground.

If you don’t believe me, think back to your early days as a driver and how you might have been easily distracted adjusting the radio, or picking up something off the floor.  If you’re like most drivers, you probably had a few near misses where the distraction from driving almost caused an accident, or worse, did cause an accident.

In my most memorable incident, I was in a vehicle with my father, approaching a merge and was trying to downshift.  I was still intent on learning to drive stick and this truck was a bit tricky at times.  But I was determined to properly downshift.  So determined in fact that I ignored the big red hexagonal sign with the bright white letters instructing me as to what I really should have been doing.  I also ignored (well honestly I think I may have snapped at least one reply) my father’s increasing admonishment to do as that sign instructed lest something bad happened.

I can’t recall if I successfully downshifted or not, but I do know that once I returned to the actual task at hand, DRIVING, I was about 40′ beyond said sign and was lucky I hadn’t been hit by a car from the other leg of the merge.  I had been distracted by something that I thought was very important, “downshifting and not stalling out” and missed the real goal at the time, STOPPING.  Or in other words, staying alive.  Sure, downshifting and not stalling out was an admirable goal.  And had the “stopping” part been successfully managed, the proper goal to focus on.  But the “stopping” part really trumped all else.

 

So: Stay Alive first and then focus on winning

 

 

 

White Ford Taurus

So, listening to the 24 hours of SQL Pass webinars. The current topic is “I Was Young and Didn’t Know Any Better” and the panelists are sharing war stories of mistakes they’ve made.

So far they all sound familiar.  So I thought I’d share one of mine.  Well technically not my mistake, but one that I adopted.

Many moons ago, I was advising a company that was involved in building websites for car dealerships.  One day they needed to do an update to the live data.  This was back in the days when all code and updates were cowboy updates.  Of course you ran the query on the live database the first time. You didn’t necessarily have a stating database or even as was later discovered, good backups.

Apparently a customer needed to update a car in their inventory.

UPDATE AUTO set cartype=’White Ford Taurus’

Nice, syntactically valid… and a disaster.  Ayup.  Suddenly every car in the database at every dealership was now a White Ford Taurus.

Ever since then we called that the “White Ford Taurus” problem.

Now, I might mock doing updates on live data, but sometimes its necessary.  I’m curious how others prevent their own “White Ford Taurus” problems.

Personally, I just now make EXTRA effort to make sure I have a WHERE clause.

But I also tend to almost always do it as:

BEGIN TRAN
UPDATE AUTO set cartype=’White Ford Taurus’
if @@rowcount<> 1 rollback tran else commit tran

Or sometimes I’ll simply hardcode the rollback tran, run it once, see what happens and then rerun it with a commit tran.

So, if rather than updating the 1 row I want, I find myself updating 1000s of rows, I’ll catch myself and be safe.

Sure, it’s not perfect, both it and using the WHERE clause require me to make sure I don’t forget them.  But the more ways to catch it, the better.

Obviously avoiding ad-hoc updates on live data is preferable, but when you can’t, be extra careful.  And of course make sure you have good backups. But that goes without saying.