A Lost Sked

Not much time to write this week. I’m off in Alabama crawling around in the bowels of the Earth teaching cave rescue to a bunch of enthusiastic students. The level I teach focuses on teamwork. And sometimes you find teams forming in the most interesting ways.

Yesterday our focus was on some activities in a cave (this one known as Pettyjohn’s) that included a type of a litter known as a Sked. When packaged it’s about 9″ in diameter and 4′ tall. It’s packaged in a bright orange carrier. It’s hard to miss.

And yet, at dinner, the students were a bit frantic; they could not account for the Sked. After some discussion they determined it was most likely left in the cave.

As an instructor, I wasn’t overly concerned, I figured it would be found and if not, it’s part of the reason our organization has a budget for lost or broken equipment, even if it’s expensive.

That said, what was quite reassuring was that the students completely gelled as a team. There was no finger pointing, no casting blame. Instead, they figured out a plan, determined who would go back to look for it and when. In the end, the Sked was found and everyone was happy.

The moral is, sometimes an incident like this can turn into a group of individuals who are blaming everyone else, or it can turn a group into a team where everyone is sharing responsibility. In this case it was it was the latter and I’m quite pleased.

Mistakes were made

I generally avoid Reddit, I figure I have enough things in my life sucking my time. But from time to time one link comes across my screen that I find interesting. This is one of them.

The user accidentally deleted a production database. Now, I think we can all agree that deleting a database in production is a “bad thing”. But, whose fault is this really?

Yes, one could argue the employee should have been more careful, but, let’s backup.

The respondents in the thread raise several good points.

  • Why were the values in the documentation ones that pointed to a LIVE, production database? Why not point to a dev copy or even better yet, one that doesn’t really exist. They expect the person to update/change the parameters anyway, so worst case if they put in the fake ones in is, nothing happens.
  • Why didn’t Production have backups? This is a completely separate question, but a very important one!
  • Why fire him? As many pointed out, he had just learned a VERY valuable lesson, and taught the company a very valuable lesson too!

I’ll admit, I’d something similar in my career at one of my employers. Except I wasn’t an intern, I was the Director of IT, and my goal in fact WAS to do something on the live database. The mistake I made was a minor one in execution (reversed the direction of an arrow on the GUI before hitting the Execute button) but disastrous in terms of impact. And of course there wasn’t a recent enough backup.

I wasn’t fired for that.  I did develop and enforce our change control documents after that and always ensured, as much as possible that we had adequate backups. (Later in a my career, a larger, bigger muckup did get me… “given the opportunities to apply my skills elsewhere”, but there were other factors involved and I arguably wasn’t fired JUST for the initial issue.)

As the Director of IT, I made a point of telling my employees that story. And I explained to them, that I expected them to make mistakes. If they didn’t they probably weren’t trying hard enough. But I told them the two things that I wouldn’t accept would be lying about a mistake (trying to cover it up, or blame others, etc) and repeatedly making the same mistake.

I wrote in an earlier post that mistakes were avoidable. But as I pointed out, it’s actually more complex than that. Some mistakes are avoidable. Or, at least they can be managed. For example, it is unfortunately likely that at some point, someone, somewhere, will munge production data.  Perhaps they won’t delete it all, or perhaps they’ll do a make a White Ford Taurus type mistake, but it will happen.  So you have safeguards in place. First, limit the number of people in a position to make such a mistake. Second, have adequate backups. There’s probably other steps you can do to reduce the chance of error and mitigate it when it does eventually happen. Work on those.

But don’t fire the person who just learned a valuable lesson. They’re the one that is least likely to make that mistake again.  Me, I’d probably fire the CTO for not having backups and for having production values in documentation like that AND for firing the new guy.