Shouldn’t that be plugged in?

That was the question a friend of mine in 6th grade asked. As a result I developed what I call the Charlie M. rule after my friend. It was sort of Show and Tell day in 6th grade and we were supposed to talk about our hobbies. I brought in a circle of HO scale track (18″ radius for those interested) and my locomotive (a model GP-38) and some cars and of course the transformer to power it all.

I set it all up in front of the class and dutifully tried to demonstrate it. Nothing moved. I checked to make sure the engine was properly on the tracks: check. I made sure the wires were connected to the transformer: check. I made sure the wires were connected to the track: check.  I was stumped: check. Finally Charlie raised his hand and asked, “Shouldn’t that be plugged in?”  Ayup, in all my nervousness and being hurried, I had forgot the most basic step, of plugging in the transformer.

I try to keep this in mind when troubleshooting: check the obvious. I ran into this again over the weekend when trying to get my BMW Z3 running again. (Side note: no, consulting does not pay that well. This is one of the few tangible items I have left from my dad’s estate). It had stopped running late last fall and at the time I spent a little time trying to make it run, without much success. Finally, with the family’s help I pushed and pulled it into a shelter for the winter and then left it for the winter.

I wasn’t planning on worrying about it until later this month, but then… well let’s just say when I put the large box with metal corners into the rear of the Subaru, I forgot to check the obvious and slammed the rear hatch down on the box. Well, the box, realizing it didn’t have enough room, decided to take advantage of the metal corner and proceeded to make more room by punching out the rear window of the Subaru. Oops.  Such a simple mistake, but a large one.

So, while waiting for the Subaru to get fixed, I decided it was time to get the BMW on the road.

Now due to the symptoms, I knew it wasn’t a dead battery or bad gas. So taking advantage of what I call my extended brain, I asked others for help.  We had narrowed the problem down to either the clutch interlock switch or the starter. Neither looked like it would be an easy self-service and I was getting frustrated. I finally decided that perhaps checking the ODB-II codes might yield more information. Strangely though, the reader didn’t power up; there were no codes to read. That struck me as a strange. So here I did check the obvious: I took the reader to the Subaru and made sure the reader worked. And it worked fine on the Subaru. I went back to my extended brain and mentioned that.

“Oh, have you checked the fuses?”

“Nah I thought about it, but everything seems to have power.”

“You sure, sounds like the onboard computer fuse might be blown.”

So, I trudged out and took off the fuse cover.  Now, I don’t really believe in fate or signs from God, but it was weird, in the list of about 40 fuses, the first one my eyes fell on was Computer. “Nah, can’t be.”

I pulled it, and sure enough, it was burned out. I pulled it and replaced it. Got in the car and thought, “it can’t be that easy, can it?” A turn of the key and the next thing I knew, the 6 cylinders were purring.

All that work and frustration because I had overlooked the basics.

This is far from the first time I’ve overlooked the basics. And I bet you’ve done the same thing. I have a theory about why we do this, and it is in part because the basics ARE so fundamental that we assume it has to be something else. In my model train example, dirty track and loose wires, especially in an ad-hoc setup are arguably a more common issue than forgetting to plug in the transformer. In my BMW case, because literally everything else worked, I assumed the power was getting to the computer. And honestly, even now, thinking about it, I’m surprised the dash light startup didn’t change at all because of a lack of computer.

I’ve seen this in databases and elsewhere. I was recently trying to do a quick restore of a database from one machine to another and the obvious wasn’t working. It took me a bit to remember the client’s new security setup prevented this specific case for these two machines. Once I remembered that, the problem and subsequent solution were obvious.

This in part goes back to why I like using a rubber-duck at times. It can force you to review your assumptions and check the basics.

Having a problem? Employ the Charlie M. rule and check the basics.


Failure is Required

Last week one of my readers, Derek Lyons correctly called me out on some details on my post about Lock outs. Derek and I go back a long ways with a mutual interest in the space program. His background is in nuclear submarines and some of the details of operations and procedures he’s shared with me over the years have been of interest.  The US nuclear submarine program is built around “procedures” and since the adoption of their SUBSAFE program, has only suffered one hull-loss and that was with the non-SUBSAFE-certified USS Scorpion.

The space program is also well known for its heavy reliance on procedures and attention to detail and safety. Out of the Apollo 13 incident, we have the famous quote, “Failure is not an option” attributed to Gene Kranz in the movie (but there’s no record of him saying it at the time.)

Anyway, his comments got me thinking about failures in general.

And I’d argue that with certain activities and at a certain level, this is true. When it comes to bringing a crew home from the Moon, or launching nuclear missiles, or performing critical surgeries, failure is not an option.

But sometimes, not only is it an option I’d say it’s almost a requirement. I was reminded of this at a small event I was asked to help be a panelist at last week.  It turned out there were 3 of us panelists and just 2 students from a local program to help folks learn to code: AlbanyCanCode. The concept of agile development was brought up and the fact that agile development basically relies on failing fast and early.  For software development, the concept of failing fast really only costs you time. And agile proponents will argue that in fact it saves you time and money since you find your failures much earlier meaning you spend less time going down the wrong path.

But I’m going to shift gears here to an area that’s even more near and dear to my heart: cave rescue.  At an overarching, one might say strategic level, failure is not an option. We teach in the NCRC that our goal is to get the patient(s) out in as good or better shape than we found them as quickly and safely as possible.  In other words, if we end up killing a patient, but get them out really quickly, that’s considered a failure; whereas if we take twice as long, but get them out alive, that’s considered a success.

But how do we do that?  Where does failure come into play?

One of the first lessons I was taught by one of my mentors was to avoid “the mother of all discussions.” This lesson hit home during an incident in my Level 1 training here in New York. We had a mock patient in a Sked. Up to this point it had been walking passage through a stream with about 1″ of water. But we had hit a choke point where the main part of the ceiling came down to about 12″ above the floor passage.  There was alternative route that would involve lifting the patient up several feet and then over some boulders and through some narrow and low (but not 12″ low passage) and then we’d be back to walking passage.  I and two others were near the head of the litter.  At this point we had placed the litter on the ground (out of the water).  We scouted ahead to see how far the low passage went and noticed it went about a body length.  A very short distance.

Meanwhile the rest of our party were back in the larger passage having the mother of all discussions. They were discussing whether we should could drag the litter along the floor, lift it up to go high, or perhaps even for this part, remove the patient from the litter and have them drag themselves a bit.  There may have been other ideas too.

My two partners and I looked at each other, looked at the low passage, looked at the patient, shrugged our shoulders and dragged the patient through the low passage to the other side.

About 10 seconds later someone from the group having the mother of all discussions exclaimed, “where’s the patient?”

“Over here, we got him through, now can we move on?”

They crawled through and we completed the exercise.

So, our decision was a success. But what if it had been a failure. What if we realized that the patient’s nose was really 13″ higher than the floor in the 12″ passage. Simple, we’d have pulled the patient back out. Then we could have shut down the mother of all discussions and said, “we have to go high, we know for a fact the low passage won’t work.”

Failure here WAS an option and by actually TRYING something, we were able to quickly succeed or fail and move on to the next option.

Now obviously one has to use judgement here. What if the water filled passage was 14″ deep. Then no, my partners and I certainly would NOT have tried to move the patient with just the three of us. But perhaps we might have convinced the group to try.

The point is, sometimes it can often be faster and easier to actually attempt a concept than it is to discuss it to death and consider every possibility.

Time and time again I’ve seen students in our classes fall into the mother of all discussions rather than actually attempt something. If they actually attempt something they can learn very quickly if it will work or not. If it works, great, the discussion can now end and they can move on to the next challenge. If it doesn’t work, great, they’ve narrowed down their options and can discuss more intelligently about the remaining options (and then perhaps quickly iterate through those too.)

So today’s take away, is don’t be afraid of failure. Embrace it. Enjoy it. Experience it. It will lead to learning.  Just make sure you understand the price of failure.  Failure may be an option and is sometimes mandatory, but in other cases, the old saw is true, failure is not an option, especially if failure means the loss of life.


Locked Out

As I’ve mentioned, not only am I fascinated by disasters and their root causes and how we react, I’m also fascinated by how we take steps to prevent them.  In my book IT Disaster Response: Lessons Learned in the Field I discuss the idea of blue-flagging on railroads.  The important concepts were two-fold: 1) a method of indicating that the train should not be moved and 2) controls on who could remove that indication.

During my recent power outage, I came across something similar.  It should be the featured image for this article.  It’s basically an orange flag locked to a utility pole.  Note the key word there, locked.

The photo doesn’t show the fact that this utility pole contained circuit breakers (I believe that’s the proper term in this context) for the overhead power lines. They had been tripped as a result of a tree further down the road taking out all three supply lines.

Close up of orange flag on utility pole, along with tag with lock-out information

Close-up with tag and flag.

So let’s analyze this a bit:

The orange flag itself was VERY visible. This ensures any other crews that might be in the area that there is something they need to notice.

There is a tag with detailed information. It’s hard to see in the above photo, but it includes who tagged it, the location, and date and some other information.

What’s not clear, is it’s padlocked to the pole.

Now, to be clear, this is NOT a physical lock-out like you see on some power panels (i.e. where the padlock physically prevents the circuit-breaker from being opened or closed).

In this case, a physical lock-out would most likely have to be placed 30′ in the air at the top of the pole where it wouldn’t be easily noticed.

But that said, this served its purpose. It alerted other crews to a danger in the area and presumably can only be removed by the person who put it there. And it contains information on that person so they can be reached if there are questions.

Since power was restored within 1 hour and I didn’t hear of any reports of line worker getting electrocuted, this appears to have worked.

Today’s take-away: when you have a change from the normal state of operation, what steps can you take to ensure that others don’t try to return items to a normal state of operations without confirming things first? By the way, for a good read-up on how bad things can go when intentions about a non-standard mode of operation don’t get properly communicated, I recommend reading up on the events leading up to the Chernobyl disaster.

Procedures are important. Deviating from them can have serious consequences. Do what you can to minimize the possibility of deviations.


Oil Change Time and a Rubber Ducky

Sometimes, the inspiration for this blog comes from the strangest places. This time… it was an oil change.

I had been putting off changing my oil for far too long and finally took advantage of some free time last Friday to get it changed. I used to change it myself, but for some reason, in this new car (well new used car, but that’s a story for another day) I’ve always paid to get it changed. (And actually why I stopped changing it myself is also a blog post for another day.)

Anyway, I’ve twice now gone to the local Valvoline. This isn’t really an add for Valvoline specifically but more a comment on what I found interesting there.

So, most places where I’ve had my oil changed, you park, go in, give them your name and car keys and wait. Not here, they actually have you drive the car into the bay itself and you sit in the car the entire time. I think this is a bit more efficient, but since, instead of lifting the car, they have a pit under the car, I suppose they do risk someone driving their car into the pit (yes, it’s guarded by a low rail on either side, but you know there are drivers just that bad out there).

So, while sitting there I observed them doing two things I’m a huge fan of: using a checklist and calling out.

As I’ve talked about in my book and here in my own blogs, I love checklists. I recommend the book The Checklist Manifesto. They help reduce errors.  And while changing oil is fairly simple, mistakes do happen; the wrong oil gets put in, the drain plug isn’t properly tightened, too much gets put in, etc.

So hearing them call out and seeing them check off on the computer what they were doing, helps instill confidence. Now, I’m sure most, if not all oil change places do this, but if you’re sitting in the waiting room, you don’t get to see it.

But they also did something else which I found particularly interesting: they did a version of Pointing and Calling.  This is a very common practice in the Japanese railway system. One study showed it reduced accidents by almost 85%. So while changing my oil, the guy above would call out what he was doing. It was tough to hear everything he was calling out, but I know at one point the call was “4.5 Maxlife”  He then proceeded to put in what I presume was 4.5 quarts of the semi-synthetic oil into my engine (I know it was the right oil because I could see which nozzle he selected). I didn’t count the clicks, but I believe there was 9.  Now, other than the feedback of the 9 clicks, the guy in the pit couldn’t know for sure that it was the right oil and amount, but, I’m going to guess he had a computer terminal of his own and had his screen said “4 quarts standard” he’d have spoken up.  But even if he didn’t have a way of confirming the call, by speaking it out loud the guy above was engaging more of his brain in his task, which was more likely to reduce the chances of him making a mistake.

I left the oil change with a high confidence that they had done it right. And I was glad to know they actually were taking active steps to ensure that.

So, what about the rubber duck?

Well, a while back I started to pick up the habit of rubber duck debugging. Working at home, alone, it’s often hard to show another developer my code and ask, “Why isn’t this working?”  But, if I encounter a problem and I can’t seem to figure out why it’s not working. I now pull out a rubber duck and start working through the code line by line. It’s amazing how well this works.  I suspect that by taking the time to slow down to process the information and by engaging more of my brain (now the verbal and auditory portions), like pointing and calling, it helps bring more of my limited brain power to bear on the problem.  And if that doesn’t work, I still have my extended brain.

PS As a reminder, this coming Saturday I’m speaking at SQL Saturday Philadelphia. Don’t miss it!