Safety Third

This is actually the name of an episode of Dirty Jobs. But it’s a title that has stuck with me because it’s near and dear to the sort of things I like to think about. Mike Rowe has a good follow-up article here. The title and show ruffled feathers, but he’s right, it’s an important concept to discuss.

You’ll often hear the mantra “Safety First”. This often means in work places things like wearing fall protection when working at height, or wearing a life vest when working in water, or ear protection, or other safety measures. The idea being that above all else, we have to be safe.

I got thinking about this while reading Rand Simberg’s book, Safe is Not an Option.  He argues that trying to make safety the highest priority of spaceflight is holding us back. I tend to agree.  And I’d like to argue out that despite NASA talking about safety in public announcements, the truth is NASA hasn’t always been upfront about it and also it has made decisions where safety wasn’t first (and I would argue in some cases those decisions were justified).

Now I know at least a few of my readers have read the Rogers Commission Report on the Challenger shuttle disaster.  It’s worth the read, especially Dr. Feynman’s appendix. One of the issues that came up during the investigation was exactly how safe the Shuttle was. (Here I’m referring to the entire system, the orbiter, SRBs and ET). Some at NASA were claiming that the Shuttle had a 1 in 100,000 chance at a loss of an orbiter. (a loss of a an ET or SRB as long as it didn’t impact the Shuttle wasn’t really a concern, as all ETs were lost at the end of each mission and at least 2 SRBs were lost due to other issues). As Feynman pointed out, this meant you could fly the Shuttle every day for 300 years and only have one accident.  What was the reasoning behind such an argument? Honestly, nothing more than wishful thinking.   As we know, the shuttle was far less safe, 1 in 67.5.  That’s a hugely different number.

There were many reasons that lead to either accident and I won’t delve into them here; though I would highly recommend The Challenger Launch Decision by Diane Vaughen as a comprehensive analysis of the decision making that helped lead up to the Challenger disaster.

But let’s talk a bit about how things could have been made safer, but NASA correctly decided NOT to go down that route.  One early iteration of the shuttle design had  additional SRBs mounted to the orbiter that would have been used to abort during an additional 30 seconds of the flight envelope1. I can’t determine if these 30 seconds would have overlapped with the critical 30 seconds Challenger’s final mission. But let’s assume they did. The total cost would have added $300 million to the development of the program and reduced the payload capacity of the orbiter2..

In a system already beset with cost considerations and payload considerations, this might have meant the program never got off the ground literally. Or if it did, it would fail to meet its payload guidelines.  All this for 30 more seconds of additional safety. Would that have been worth it? Arguably not.

Another design decision was to eliminate thrust termination for the SRBs. Again, this is something that would have arguably made the ascent portion of the flight safer: in theory.  The theory being that since you can’t normally shut down the SRBs, you can’t perform an orbiter separation, which means the orbiter can’t detach during the first 2 minutes of the flight and hence can’t perform a return to launch site abort.

But again, adding that safety feature didn’t necessarily make things better. For one thing, it really only would have been useful above a certain altitude since below that altitude all the orbiter could have done is detach from the stack and fallen into the sea with too little time to get into a glide position and make it back to a runway.

But there was a bigger issue: the thrust termination was determined to be violent enough it would probably have damaged the orbiter if used. This could have been mitigated by beefing up the orbiter structure. But this would have imposed an 8,000 lb payload penalty. Since the shuttle was already having trouble reaching its 65,000 lb payload goal, this was determined to be unacceptable3.

So, NASA could have made the decision of “safety first” and ended up with a shuttle system that never would have flown. And given the political calculus at the time, it’s unlikely NASA could have come up with a better solution nor had Congress fund it. The shuttle was an unfortunate compromise brought about a host of factors. But it did fly.

As I like to tie this back to some of my other interests; so what about caving and cave rescue.? I mentioned in a previous post how we’ve moved away from treating one line in the system strictly as a belay line. But what if I told you we often only use one line! There are many places in caving and cave rescue where we do not have a belay line. A good example is for a caver ascending or descending a rope.  This is called Single Rope Technique or SRT. There are some who come to caving from other activities and ask “where’s your belay? You have to have a belay!”

But, a belay line (here used in the sense of catching a caver from a potentially dangerous fall if their mainline fails) is actually far less safe.  I’ll give an example. First let’s start with some possible failure modes

  1. Main rope being cut or damaged to the point of failure
  2. The point the rope is rigged to (the anchor point) failing
  3. Your ascent or descent system failing

So the idea is, if one of those 3 things happen, the belay line will catch you.  But there’s issues with that theory. One major issue is that large drops in caves are often accompanied by air movement and waterfalls. The air movement, or even simple movements by the caver (and influenced by the rope in some cases) can cause a twisting motion. This means that before you know it, your belay line has been twisted around your mainline and you can no longer ascend or descend. You’re stuck. Now combine this with being in a waterfall and you’ve become a high-risk candidate for hypothermia, drowning, and harness hang syndrome.  In other words, your belay line has now increased your chances of dying. So much for the attitude safety first.

Even if you avoid those issues, you haven’t really solved the possible failure modes I listed. If you think about it, anything that’s going to damage your mainline is possible to your belay line. There are some differences, your belay line, for example because it’s moving is far less likely to wear through in a single spot like a mainline might from being bounced on during an ascent. On the other hand it’s more possible to suffer a shock load over a sharp edge if it’s not attended well.

If your mainline anchor point fails, you’re relying on your belay anchor point to be stronger. If it’s stronger, why not use it for your mainline? (there are reasons not to, but this is a question that should cross your mind.)

Finally, for equipment failure, catastrophic failure is rare (only seen in movies honestly) and other failures are better mitigated by proper inspection of your equipment and close attention to proper technique.

Of course the safest thing to do, if we were really putting safety first would to never go caving. But where’s the fun in that.

We can insist on safety first in much of what we do, but if we do, we inhibit ourselves from actually accomplishing the activity and in some cases can actually make things LESS safe by trying to add more safety. And safety is more than simply adding additional pieces to a system. It’s often proper procedures. Rather than adding a belay line, focusing on better rigging and climbing technique for example. Or even simply accepting that sometimes things can go sideways and people may be injured or die.  We live in a dangerous world and while we can make things safer and often should, we should be willing to balance our desire for safety with practicality and the desirability of the goal.

I’m going to end with two quotes from an engineer I respected greatly, Mary Shafer who formerly worked at NASA at what was Dryden Flight Research Center and is now the Armstrong Flight Research Center at Edwards Air Force Base.

Insisting on absolute safety is for people who don’t have the balls to live in the real world.

and

There’s no way to make life perfectly safe; you can’t get out of it alive.

For a more complete record of Mary’s thoughts, I direct you to this post.

Footnotes

    1. Space Shuttle – The First Hundred Missions. Dennis Jenkins, 2001. Page 192
    2. Ibid.
    3. Ibid

Train as you Fight

There’s a military aphorism “Train as you fight, fight as you train.” I was recently reminded of this by a friend mine and a reader of my blog.  We’ve shared a mutual interest in the space program for decades.  He mentioned this last week (though I can’t seem to find the post) in response to something I wrote and it got me thinking.

When we teach cave rescue, we almost always use a real patient in the litter. There’s a couple of reasons for this. For one, it ipso facto recreates the actual mass and weight distribution of a real patient. Now, there are training dummies that are similar in weight and mass, but they can be a pain in the neck. For one thing, ever try to move an inert body?  That’s what a training dummy can be like. Sure it’s great once it’s IN the litter, but getting it into position deep inside a cave can be almost impossible.

For another, it gives our students a chance to experience what being a patient feels like. This gives them a deeper appreciation for what it feels like to be moved through a cave. For example, you quickly realize that perhaps being dragged over the floor is less than ideal. Or, you learn as a patient what it feels like when your rescuers become nameless and faceless behind the glare of a dozen headlamps; next time you’re you’re a rescuer, you tend to keep in mind there’s an actual patient there and talk to them and treat them like an actual person, not a lump you’re moving through the cave.

And this leads to one of the biggest reasons: we don’t want our students to get in the habit of treating a patient like a lump in a litter. We want them to realize there’s an actual person in there.

I once did a practice rescue with a local sheriff’s department. Since it was their exercise, they set the rules. They elected to use a straw dummy as the patient.  They congratulated themselves on a successful rescue at the end of the exercise. I saw a disaster. For one thing, the litter was so light, they could have probably had one person pick it up and carry it out of the cave. This may sound like a minor or even funny nit to pick. But, it can lead the Incident Commander to misjudge the crew size that may be necessary in a real rescue. (We had a cave rescue here in New York State about 20 years ago where the patient was only 300 feet into the cave. It was so arduous that we ended up having to fly in cavers from West Virginia; all the local cavers who could fit were completely exhausted.)

Because of the lightness they were practically bouncing the litter off the ceiling and walls because straw dummies don’t scream in pain when they hit rock.  If they had tried to move an actual patient in that manner, they’d might have been surprised by the patient’s expressive vocabulary.

Training as one fights, or training as one rescues doesn’t necessarily mean that every scenario exactly recreates what you expect to happen. As another adage says, “no battle plan encounters the first contact with the enemy.”  So you might train with a mock patient who is 180lbs and has a broken leg.  And then in a real event, the patient is 240lbs, diabetic and has a broken pelvis, twisted ankle and dislocated elbow.  So no, you’re not going to practice every scenario. But you’re going to practice the general concepts and understand the ideas behind them.  You want an effective fighting force, you put them in the field. You have explosions, gunfire, smoke, rain, mud, etc. You don’t simply sit them in a classroom and discuss these points.

The flip side, fight as you train is important too. When the fighting or rescuing begin, you can draw upon your experience in training and will be far less panicked. I know at the few rescues I’ve been involved in, that once I’m on site, I’ve become very calm. The training clicks.  You can usually tell the untrained folks at an accident because they’re either panicking or have no idea what to do. The trained folks tend to react much more calmly. Also, trained people can act with a sense of urgency that doesn’t look like panic. Untrained people often move quickly, but without a sense of purpose. Don’t confuse moving quickly with moving urgently.

And all this applies to IT. I’ve said again and again that IT departments need to exercise their disaster recovery plans. It’s great to discuss them in a meeting and have a senior manager sign off on them. It’s another thing to actual practice mock disasters. This is when you realize that “oh Shelly is out on Wednesdays afternoons and only her computer has the phone numbers of the building manager.”  Or “Oh, we were sure that the batteries were in good shape, but turns out they’re getting old and we only had 1/2 the runtime we expected.” Or, as has happened too many times, “oh we thought we had good backups, until we went to restore them.”

And practicing your DR plans means you’ll be far less pressured when you execute them and as a result will make far fewer mistakes.

Today’s take-away: practice until it becomes second nature so that when you need to act for real it is second nature.

Failure is Required

Last week one of my readers, Derek Lyons correctly called me out on some details on my post about Lock outs. Derek and I go back a long ways with a mutual interest in the space program. His background is in nuclear submarines and some of the details of operations and procedures he’s shared with me over the years have been of interest.  The US nuclear submarine program is built around “procedures” and since the adoption of their SUBSAFE program, has only suffered one hull-loss and that was with the non-SUBSAFE-certified USS Scorpion.

The space program is also well known for its heavy reliance on procedures and attention to detail and safety. Out of the Apollo 13 incident, we have the famous quote, “Failure is not an option” attributed to Gene Kranz in the movie (but there’s no record of him saying it at the time.)

Anyway, his comments got me thinking about failures in general.

And I’d argue that with certain activities and at a certain level, this is true. When it comes to bringing a crew home from the Moon, or launching nuclear missiles, or performing critical surgeries, failure is not an option.

But sometimes, not only is it an option I’d say it’s almost a requirement. I was reminded of this at a small event I was asked to help be a panelist at last week.  It turned out there were 3 of us panelists and just 2 students from a local program to help folks learn to code: AlbanyCanCode. The concept of agile development was brought up and the fact that agile development basically relies on failing fast and early.  For software development, the concept of failing fast really only costs you time. And agile proponents will argue that in fact it saves you time and money since you find your failures much earlier meaning you spend less time going down the wrong path.

But I’m going to shift gears here to an area that’s even more near and dear to my heart: cave rescue.  At an overarching, one might say strategic level, failure is not an option. We teach in the NCRC that our goal is to get the patient(s) out in as good or better shape than we found them as quickly and safely as possible.  In other words, if we end up killing a patient, but get them out really quickly, that’s considered a failure; whereas if we take twice as long, but get them out alive, that’s considered a success.

But how do we do that?  Where does failure come into play?

One of the first lessons I was taught by one of my mentors was to avoid “the mother of all discussions.” This lesson hit home during an incident in my Level 1 training here in New York. We had a mock patient in a Sked. Up to this point it had been walking passage through a stream with about 1″ of water. But we had hit a choke point where the main part of the ceiling came down to about 12″ above the floor passage.  There was alternative route that would involve lifting the patient up several feet and then over some boulders and through some narrow and low (but not 12″ low passage) and then we’d be back to walking passage.  I and two others were near the head of the litter.  At this point we had placed the litter on the ground (out of the water).  We scouted ahead to see how far the low passage went and noticed it went about a body length.  A very short distance.

Meanwhile the rest of our party were back in the larger passage having the mother of all discussions. They were discussing whether we should could drag the litter along the floor, lift it up to go high, or perhaps even for this part, remove the patient from the litter and have them drag themselves a bit.  There may have been other ideas too.

My two partners and I looked at each other, looked at the low passage, looked at the patient, shrugged our shoulders and dragged the patient through the low passage to the other side.

About 10 seconds later someone from the group having the mother of all discussions exclaimed, “where’s the patient?”

“Over here, we got him through, now can we move on?”

They crawled through and we completed the exercise.

So, our decision was a success. But what if it had been a failure. What if we realized that the patient’s nose was really 13″ higher than the floor in the 12″ passage. Simple, we’d have pulled the patient back out. Then we could have shut down the mother of all discussions and said, “we have to go high, we know for a fact the low passage won’t work.”

Failure here WAS an option and by actually TRYING something, we were able to quickly succeed or fail and move on to the next option.

Now obviously one has to use judgement here. What if the water filled passage was 14″ deep. Then no, my partners and I certainly would NOT have tried to move the patient with just the three of us. But perhaps we might have convinced the group to try.

The point is, sometimes it can often be faster and easier to actually attempt a concept than it is to discuss it to death and consider every possibility.

Time and time again I’ve seen students in our classes fall into the mother of all discussions rather than actually attempt something. If they actually attempt something they can learn very quickly if it will work or not. If it works, great, the discussion can now end and they can move on to the next challenge. If it doesn’t work, great, they’ve narrowed down their options and can discuss more intelligently about the remaining options (and then perhaps quickly iterate through those too.)

So today’s take away, is don’t be afraid of failure. Embrace it. Enjoy it. Experience it. It will lead to learning.  Just make sure you understand the price of failure.  Failure may be an option and is sometimes mandatory, but in other cases, the old saw is true, failure is not an option, especially if failure means the loss of life.

 

White (K)nights

I apologize for skipping two weeks of blog posts, but I was a bit busy; for about 11 days my family and I were visiting Europe for the first time. It was a wonderful trip. It started with a trip to Manchester UK for a SQL Saturday event.

I had sort of forgotten exactly how much further north we were until it dawned on me how early dawn was.  Actually we had noticed the night before as we walked back from the amazingly wonderful speakers’ dinner how light it was despite how late it was.  When I woke up at around 4:30 AM (a bit of jetlag there) I noticed despite the blackout curtains how bright it was around their edges. I later looked it up, and it appears that technically it never reached “night” there, but simply astronomical twilight.

Ever since seeing the movie “White Nights” my wife has always wanted to experience the white nights of Russia. This wasn’t that, but it was close.

This trip followed up on the heels of the amazingly successful Thai Cave Rescue that I had previously commented on. As long term readers know, I’m a caver who also teaches cave rescue and has a role as the Northeast Coordinator of the National Cave Rescue Commission. During the 18 day saga, I and others were called upon by various media outlets to give our insight and perspective. I was fortunate, I only did a little under a dozen media events. Our National Coordinator, Anmar Mirza did well over 100, and most of those in about a 5 day period. A link to one of my media events is here: The Takeaway.

I don’t want to talk about the operation itself, but I want to talk about White Knights. We love our White Knights: the term often refers to a character who will ride into town and single-handedly solve the town’s problems. The truth is, white knights rarely if ever exist and that most problems require a lot more effort to solve.

We’ve seen this in politics, and we saw this with this cave rescue. Let me start by saying I think the work Elon Musk has done with SpaceX is amazing. SpaceX has in fact single-handedly revolutionized the space launch market.

It was perhaps inevitable that Musk’s name would show up in relation to this cave rescue. Musk has previously gotten attention for attempting to help with the power outage crisis in Puerto Rico and now his vow to help the people of Flint (both by the way I think worthy causes and I wish him and more importantly the people he’s trying to help, well).

But here’s the thing, a cave rescue isn’t solved by a white knight. It’s solved by a lot of effort and planning with a lot of people with a variety of skills and experience. There’s rarely a magic breakthrough that magically makes things easier.

And I’ll be blunt: his “submarine” idea, while interesting, was at best a PR distraction and at worst, possibly caused problems.

“But Greg, he was trying to help, how could this make things worse?”  I actually disengaged from an online debate with some Musk fanbois who couldn’t see why Musk’s offer was problematic. To them, he was the white knight that could never do wrong.

Here’s the thing: I know for a fact that several of us, myself included, had to take part of our allotted airtime or written coverage to address why Musk’s idea probably wouldn’t work. This meant less time or room for useful information to be passed on to the audience. Part of my role as regional coordinator is to educate people about cave rescue, and I can’t do this effectively when I’m asked to discuss distractions.

“But so what, that didn’t impact the rescue.” No, it didn’t. But, it appears from the Twitter fights I’ve seen, and other information, that at least some resources on the ground were tasked to deal with Musk. This does mean that people had to spend time dealing with both Musk and the publicity. This means those resources couldn’t be spent elsewhere. At least one report from Musk (which honestly I question) suggests he actually entered the cave during the rescue operations. This means that resources had to be spent on assuring his safety and possibly prevented another person who could have provided help in other ways (even if it was simply acting as a sherpa) from entering.

And apparently, there’s now a useless “submarine” sitting outside the cave.  I’ll leave discussion of why I had problems with the submarine itself for another post.

But here’s one final reason I have problem with Musk bringing so much attention to himself and his idea: It could have lead to second guessing.

Let’s be clear: even the cave divers themselves felt that they would most likely lose some of the kids; this was exactly how dangerous the rescue was. This is coming from the folks who best knew the cave and best understand the risks and issues.  Some of the best cave divers in the world, with rescue experience, who were on-site, thought that some kids would die in the attempt to rescue them. And, if reports are true, they were aware of Musk’s offer and obviously rejected it (and in fact one suggested later that Musk do something anatomically impossible with it.)

Had the rescuers worst fears come true, Musk fan bois would have second guessed every decision. In other words, people would have put more faith in their favorite white knight, who had zero practical experience in the ongoing operations , than they would have in the very people who were there and actively involved. I saw the comments before and during the operations from his fans and all of them were upset that their favorite white knight wasn’t being called in to save the day. I can only imagine how bad it would have been had something tragic occurred.

This is why I’m against white knights. They rarely if ever solve the problem, and worse when they do ride into town, they take time and energy away from those who are actually working on the problems. Leave the white knights on a chess board.

“Today is D-Day”

As I’m writing this, word has rocketed around the world that the 12 soccer players and their coach have been safely rescued from Tham Luang cave. We are awaiting word that all the rescuers themselves, including one of the doctors that had spent time with the boys since they were found, are still on their way out.

Unfortunately, one former Thai SEAL diver, Saman Kunan, who had rejoined his former teammates to help in the rescue, lost his life. This tragic outcome should not be forgotten, nor should it cast too large of a shadow on the amazing success.

What I want to talk about though is not the cave or the rescue operations, but the decision making progress. The title for this post comes from Narongsak Osottanakorn’s statement several days ago when they began the evacuation operations.

 

The term D-Day actually predates the famous Normandy landings that everyone associates it with. However, success of the Normandy landings and their importance in the ultimate outcome of WWII has forever cemented that phrase in history.

One of the hardest parts of any large scale operation like this is making the decision on whether to act. During the Apollo Program, they called them GO/NO GO decisions. Famously you can see this in the movie Apollo 13 where Gene Kranz goes around the room asking for a Go/No Go for launch. (it was pointed in a Tindellgram out before the Apollo 11 landing, that the call after the Eagle landed should be changed to Stay/No Stay – so there was no confusion on if they were “go to stay” or “go to leave”.)

While I’ve never been Flight Commander for a lunar mission, nor a Supreme Allied Commander for a European invasion, I have had to make life or death decisions on much smaller operations. A huge issue is not knowing the outcome. It’s like walking into a casino. If you knew you were always going to win, it would be an easy decision on how to bet. But obviously that’s not possible. The best you can do is gather as much information as you can, gather the best people you can around you, trust them and then make the decision.

What compounds the decision making progress in many cases, and especially in cave rescue is the lack of communication and lack of information. It can be very frustrating to send rescuers into the cave and not know, sometimes for hours, what is going on. Compound this with what is sometimes intense media scrutiny (which was certainly present here with the entire world watching), and one can feel compelled to rush the decision making progress. It is hard, but generally necessary to resist this. In an incident I’m familiar with, I recall a photograph of the cave rescue expert advising rescue operations, standing in the rain, near the cave entrance waiting for the waters to come down so they could send search teams in.  Social media was blowing up with comments like, “they need to get divers in there now!” “Why aren’t the authorities doing anything?”  The fact is, the authorities were doing exactly what the cave rescue expert recommended; waiting for it to be safe enough to act. Once the waters came down, they could send people and find the trapped cavers.

The incident in Thailand is a perfect example of the confluence of these factors:

  • There was media pressure from around the world with people were asking why they were taking so long to begin rescuing the boys and once they did start to rescue them, why it took them three days. Offers and suggestions flowed in from around the world and varied from the absurd (one suggestion we received at the NCRC was the use of dolphins) to the unfortunately impractical (let’s just say Mr. Musk wasn’t the only one, nor the first, to suggest some sort of submarine or sealed bag).
  • There was always a lack of enough information. Even after the boys had been found, it could take hours to get information to the surface, or from the surface back to the players. This hinders the decision making process.
  • Finally of course are the unknowns:
    • When is the rain coming?
    • How much rain?
    • How will the boys react to being submerged?
    • What can they eat in their condition?

And finally, there is, in the back of the minds of folks making the decisions the fact that if the outcome turned tragic, everyone will second guess them.

Narongsak Osottanakorn and others had to weigh all the above with all the facts that they had, and the knowledge that they couldn’t have as much information as they might want and make life-impacting decisions. For this I have a great deal of respect for them and don’t envy them.

Fortunately, in this case, the decisions led to a successful outcome which is a huge relief to the families and the world.

For any operation, especially complex ones, such as this rescue, a moon landing or an invasion of the beaches of Normandy, the planning and decision making process is critically important and often over shadowed by the folks executing the operation. As important as Neil Armstrong, Buzz Aldrin and Michael Collins (who all to often gets overlooked, despite writing one of the better autobiographies of the Apollo program) were to Apollo 11, without the support of Gene Kranz, Steve Bales, and hundreds of others on the ground, they would have very likely had to abort their landing.

So, let’s not forget the people behind the scenes making the decisions.

 

The Thai Cave Rescue

“When does a cave rescue become a recovery?’ That was the question a friend of mine asked me online about a week ago. This was before the boys and their coach had been found in the Thai cave.

Before I continue, let me add a huge caveat: this is an ongoing dynamic situation and many of the details I mention here may already be based on inaccurate or outdated information. But that’s also part of the point I ultimately hope to make: plans have to evolve as more data is gathered.

My somewhat flippant answer was “when they’re dead.” This is a bit of dark humor answer but there was actually some reasoning behind it. Before I go on, let me say that at that point I actually still had a lot of hope and reason to believe they were still alive. I’m very glad to find that they were in fact found alive and relatively safe.

There’s a truth about cave rescue: caves are literally a black-hole of information. Until you find the people you’re searching for, you have very little information.  Sometimes it may be as little as, “They went into this cave and haven’t come out yet.” (Actually sometimes it can be even less than that, “We think they went into one of these caves but we’re not even sure about that.”)

So when it comes to rescue, two of the items we try to teach students when teaching cave rescue is to look for clues, and to try to establish communications. A clue might be a footprint or a food wrapper. It might be the smell of a sweaty caver wafting in a certain direction. A clue might be the sound of someone calling for help. And the ultimate clue of course is the caver themselves. But there are other clues we might look for: what equipment do we think they have? What experience do they have? What is the characteristics of the cave? These can all drive how we search and what decisions we make.

Going back to the Thai cave situation, based on the media reports (which should always be taken with a huge grain of salt) it appeared that the coach and boys probably knew enough to get above the flood level and that the cave temps were in the 80s (Fahrenheit).  These are two reasons I was hopeful. Honestly, had they not gotten above the flood zone, almost certainly we’d be talking about a tragedy instead. Had the cave been a typical northeast cave where the temps are in the 40s (F) I would have had a lot less hope.

Given the above details then, it was reasonable to believe the boys were still alive and to continue to treat the situation as a search and eventually rescue situation.  And fortunately, that’s the way it has turned out. What happens next is still open for speculation, but I’ll say don’t be surprised if they bring in gear and people and bivouac in place for weeks or even months until the water levels come down.

During the search process, apparently a lot of phone lines were laid into parts of the cave so that easier communications could be made with the surface. Now that they have found the cavers, I’d be shocked if some sort of realtime communications is not setup in short order. This will allow he incident commander to make better informed decisions and to be able to get the most accurate and up to date data.

So, let me relate this to IT and disasters. Typically a disaster will start with, “the server has crashed” or something similar. We have an idea of the problem, but again, we’re really in a black-hole of information at that moment. Did the server crash because a hard drive failed, or because someone kicked the power cord or something else?

The first thing we need to do is to get more information. And we may need to establish communications. We often take that for granted, but the truth is, often when a major disaster occurs, the first thing to go is good communications. Imagine that the crashed server is in a datacenter across the country. How can you find out what’s going on? Perhaps you call for hands on support. But what if the reason the server has crashed is because the datacenter is on fire? You may not be able to reach anyone!  You might need to call a friend in the same city and have them go over there.  Or you might even turn on the news to see if there’s anything on worth noting.

But the point is, you can’t react until you have more information. Once you start to have information, you can start to develop a reaction plan. But let’s take the above situation and imagine that you find your datacenter has in fact burned down. You might start to panic and think you need to order a new server.  You start to call up your CFO to ask her to let you buy some new hardware when suddenly you get a call from your tech in the remote. They tell you, “Yeah, the building burned down, but we got real lucky and our server was in an area that was undamaged and I’ve got it in the trunk of my car, what do you want me to do with it?”

Now your previous data has been invalidated and you have new information and have to develop a new plan.

This is the situation in Thailand right now. They’re continually getting new information and updating their plans as they go. And this is the way you need to handle you disasters, establish communications, gather data and create a plan and update your plan as the data changes. And don’t give up hope until you absolutely have to.

Swiss Cheese

This blog post will try to tie together several of my favorite things: Cheese, caving, and accidents.

I was making lunch the other day and I was looking at the stick of sliced Swiss cheese I had. I should note, I love Swiss cheese, especially with a good roast beef sandwich.

But first, an existential question.  “What is a cave?”

Oh, that’s easy, it’s a passage through rock in the ground.  In other words it’s the area where there’s no rock.  Great. Let’s start simple. I think we can agree if it’s dark and I can walk through it, it’s a cave. What if I have to crawl? Yeah, that’s still a cave. What if I have to shimmy through and can barely fit? Yeah, that’s still a cave. What if I can’t fit, but one of my much smaller friends can fit through? Yeah, that’s a cave. But what if the entire thing is too small for anyone to crawl through but small animals can? What if two rooms that are large enough for humans to be in are connected by a passage too tight for a human, but say you can shine a light through, or can make a “voice connection” and hear people at the other end? Is that still part of the cave? As an aside, humans have mapped over 190 miles of Jewel Cave (and more all the time, big shout out to my friends who are mapping it!) But airflow studies estimate that we’ve only mapped about 3-5% of it. Let that sink in. But, what if the other 95% is too small for a human to fit in. I don’t think anyone would not call that part of the cave.

But here’s the real question. So we’ve mapped the cave. We know where the passages (i.e. lack of rock) are.  We find a plug of mud and remove that.  We’ve made more cave! Yeah! But what if we remove ALL the rock around the existing passage. When does the cave disappear? I mean now we just have a lot more “absence of rock”.  But I think we’d agree at some point we no longer have a cave!

So back to Swiss cheese.  One of the distinguishing details of such cheese are the holes, or more properly named the eyes. Did you know there’s actual Federal guidelines on what can be called Swiss cheese. Ayup, you can’t simply have a cheese with eyes in it. So I guess Swiss cheese is sort of like a cave. We actually have to think about it to give it some definition we can agree on.  Take away all the cheese, eyes and all, and you have no more cheese and I’m quite sad.

But what about accidents? Well, there’s a model of risk analysis called the Swiss cheese model. Basically, very few accidents occur out of the blue or entirely without a relation to other factors. The idea is you have multiple slices of Swiss cheese and all the holes have to line up for the accident to occur. For example, in my own personal experience, years ago I came close to all the “pieces” of the cheese lining up; while driving through New Jersey, I came fairly close to hydroplaning off an exit ramp into the woods.  Let’s look at some of the slices of cheese that came into play.

  • I was tired. Had I been more awake I’d have been paying a bit more attention.
  • It was dark. I might have noticed exactly how wet the exit ramp was during daylight.
  • I was travelling too fast.
  • I had nearly missed the ramp, I might have been travelling slower (see above) had I noticed the ramp sooner.

The instant I hit the ramp, I knew I was in trouble. I think the ONE slice that didn’t line up was, experience. Had I been 20 years younger with less experience driving, I suspect I’d have ended up off the road. I was at the very edge of being able to brake and maneuver and I called upon all my years of experience to stay on the correct side of that edge. One thin slice of “cheese” saved me that night.

When one looks through accident reports, of almost any industry or activity, one can start to look for where the slices lined up and how any one could be changed. One reason I read the American Cave Accidents report when I receive it is to learn where the slices could have been moved so I can make sure I don’t line up my slices of cheese.

So, the question for you is where do your slices of cheese line up?

And other question is, what sort of cheese do you put on YOUR roast beef sandwich? And do you make sure your Swiss cheese eyes don’t line up so every bite is ensured a bit of cheese?