“Today is D-Day”

As I’m writing this, word has rocketed around the world that the 12 soccer players and their coach have been safely rescued from Tham Luang cave. We are awaiting word that all the rescuers themselves, including one of the doctors that had spent time with the boys since they were found, are still on their way out.

Unfortunately, one former Thai SEAL diver, Saman Kunan, who had rejoined his former teammates to help in the rescue, lost his life. This tragic outcome should not be forgotten, nor should it cast too large of a shadow on the amazing success.

What I want to talk about though is not the cave or the rescue operations, but the decision making progress. The title for this post comes from Narongsak Osottanakorn’s statement several days ago when they began the evacuation operations.

 

The term D-Day actually predates the famous Normandy landings that everyone associates it with. However, success of the Normandy landings and their importance in the ultimate outcome of WWII has forever cemented that phrase in history.

One of the hardest parts of any large scale operation like this is making the decision on whether to act. During the Apollo Program, they called them GO/NO GO decisions. Famously you can see this in the movie Apollo 13 where Gene Kranz goes around the room asking for a Go/No Go for launch. (it was pointed in a Tindellgram out before the Apollo 11 landing, that the call after the Eagle landed should be changed to Stay/No Stay – so there was no confusion on if they were “go to stay” or “go to leave”.)

While I’ve never been Flight Commander for a lunar mission, nor a Supreme Allied Commander for a European invasion, I have had to make life or death decisions on much smaller operations. A huge issue is not knowing the outcome. It’s like walking into a casino. If you knew you were always going to win, it would be an easy decision on how to bet. But obviously that’s not possible. The best you can do is gather as much information as you can, gather the best people you can around you, trust them and then make the decision.

What compounds the decision making progress in many cases, and especially in cave rescue is the lack of communication and lack of information. It can be very frustrating to send rescuers into the cave and not know, sometimes for hours, what is going on. Compound this with what is sometimes intense media scrutiny (which was certainly present here with the entire world watching), and one can feel compelled to rush the decision making progress. It is hard, but generally necessary to resist this. In an incident I’m familiar with, I recall a photograph of the cave rescue expert advising rescue operations, standing in the rain, near the cave entrance waiting for the waters to come down so they could send search teams in.  Social media was blowing up with comments like, “they need to get divers in there now!” “Why aren’t the authorities doing anything?”  The fact is, the authorities were doing exactly what the cave rescue expert recommended; waiting for it to be safe enough to act. Once the waters came down, they could send people and find the trapped cavers.

The incident in Thailand is a perfect example of the confluence of these factors:

  • There was media pressure from around the world with people were asking why they were taking so long to begin rescuing the boys and once they did start to rescue them, why it took them three days. Offers and suggestions flowed in from around the world and varied from the absurd (one suggestion we received at the NCRC was the use of dolphins) to the unfortunately impractical (let’s just say Mr. Musk wasn’t the only one, nor the first, to suggest some sort of submarine or sealed bag).
  • There was always a lack of enough information. Even after the boys had been found, it could take hours to get information to the surface, or from the surface back to the players. This hinders the decision making process.
  • Finally of course are the unknowns:
    • When is the rain coming?
    • How much rain?
    • How will the boys react to being submerged?
    • What can they eat in their condition?

And finally, there is, in the back of the minds of folks making the decisions the fact that if the outcome turned tragic, everyone will second guess them.

Narongsak Osottanakorn and others had to weigh all the above with all the facts that they had, and the knowledge that they couldn’t have as much information as they might want and make life-impacting decisions. For this I have a great deal of respect for them and don’t envy them.

Fortunately, in this case, the decisions led to a successful outcome which is a huge relief to the families and the world.

For any operation, especially complex ones, such as this rescue, a moon landing or an invasion of the beaches of Normandy, the planning and decision making process is critically important and often over shadowed by the folks executing the operation. As important as Neil Armstrong, Buzz Aldrin and Michael Collins (who all to often gets overlooked, despite writing one of the better autobiographies of the Apollo program) were to Apollo 11, without the support of Gene Kranz, Steve Bales, and hundreds of others on the ground, they would have very likely had to abort their landing.

So, let’s not forget the people behind the scenes making the decisions.

 

The Thai Cave Rescue

“When does a cave rescue become a recovery?’ That was the question a friend of mine asked me online about a week ago. This was before the boys and their coach had been found in the Thai cave.

Before I continue, let me add a huge caveat: this is an ongoing dynamic situation and many of the details I mention here may already be based on inaccurate or outdated information. But that’s also part of the point I ultimately hope to make: plans have to evolve as more data is gathered.

My somewhat flippant answer was “when they’re dead.” This is a bit of dark humor answer but there was actually some reasoning behind it. Before I go on, let me say that at that point I actually still had a lot of hope and reason to believe they were still alive. I’m very glad to find that they were in fact found alive and relatively safe.

There’s a truth about cave rescue: caves are literally a black-hole of information. Until you find the people you’re searching for, you have very little information.  Sometimes it may be as little as, “They went into this cave and haven’t come out yet.” (Actually sometimes it can be even less than that, “We think they went into one of these caves but we’re not even sure about that.”)

So when it comes to rescue, two of the items we try to teach students when teaching cave rescue is to look for clues, and to try to establish communications. A clue might be a footprint or a food wrapper. It might be the smell of a sweaty caver wafting in a certain direction. A clue might be the sound of someone calling for help. And the ultimate clue of course is the caver themselves. But there are other clues we might look for: what equipment do we think they have? What experience do they have? What is the characteristics of the cave? These can all drive how we search and what decisions we make.

Going back to the Thai cave situation, based on the media reports (which should always be taken with a huge grain of salt) it appeared that the coach and boys probably knew enough to get above the flood level and that the cave temps were in the 80s (Fahrenheit).  These are two reasons I was hopeful. Honestly, had they not gotten above the flood zone, almost certainly we’d be talking about a tragedy instead. Had the cave been a typical northeast cave where the temps are in the 40s (F) I would have had a lot less hope.

Given the above details then, it was reasonable to believe the boys were still alive and to continue to treat the situation as a search and eventually rescue situation.  And fortunately, that’s the way it has turned out. What happens next is still open for speculation, but I’ll say don’t be surprised if they bring in gear and people and bivouac in place for weeks or even months until the water levels come down.

During the search process, apparently a lot of phone lines were laid into parts of the cave so that easier communications could be made with the surface. Now that they have found the cavers, I’d be shocked if some sort of realtime communications is not setup in short order. This will allow he incident commander to make better informed decisions and to be able to get the most accurate and up to date data.

So, let me relate this to IT and disasters. Typically a disaster will start with, “the server has crashed” or something similar. We have an idea of the problem, but again, we’re really in a black-hole of information at that moment. Did the server crash because a hard drive failed, or because someone kicked the power cord or something else?

The first thing we need to do is to get more information. And we may need to establish communications. We often take that for granted, but the truth is, often when a major disaster occurs, the first thing to go is good communications. Imagine that the crashed server is in a datacenter across the country. How can you find out what’s going on? Perhaps you call for hands on support. But what if the reason the server has crashed is because the datacenter is on fire? You may not be able to reach anyone!  You might need to call a friend in the same city and have them go over there.  Or you might even turn on the news to see if there’s anything on worth noting.

But the point is, you can’t react until you have more information. Once you start to have information, you can start to develop a reaction plan. But let’s take the above situation and imagine that you find your datacenter has in fact burned down. You might start to panic and think you need to order a new server.  You start to call up your CFO to ask her to let you buy some new hardware when suddenly you get a call from your tech in the remote. They tell you, “Yeah, the building burned down, but we got real lucky and our server was in an area that was undamaged and I’ve got it in the trunk of my car, what do you want me to do with it?”

Now your previous data has been invalidated and you have new information and have to develop a new plan.

This is the situation in Thailand right now. They’re continually getting new information and updating their plans as they go. And this is the way you need to handle you disasters, establish communications, gather data and create a plan and update your plan as the data changes. And don’t give up hope until you absolutely have to.

Swiss Cheese

This blog post will try to tie together several of my favorite things: Cheese, caving, and accidents.

I was making lunch the other day and I was looking at the stick of sliced Swiss cheese I had. I should note, I love Swiss cheese, especially with a good roast beef sandwich.

But first, an existential question.  “What is a cave?”

Oh, that’s easy, it’s a passage through rock in the ground.  In other words it’s the area where there’s no rock.  Great. Let’s start simple. I think we can agree if it’s dark and I can walk through it, it’s a cave. What if I have to crawl? Yeah, that’s still a cave. What if I have to shimmy through and can barely fit? Yeah, that’s still a cave. What if I can’t fit, but one of my much smaller friends can fit through? Yeah, that’s a cave. But what if the entire thing is too small for anyone to crawl through but small animals can? What if two rooms that are large enough for humans to be in are connected by a passage too tight for a human, but say you can shine a light through, or can make a “voice connection” and hear people at the other end? Is that still part of the cave? As an aside, humans have mapped over 190 miles of Jewel Cave (and more all the time, big shout out to my friends who are mapping it!) But airflow studies estimate that we’ve only mapped about 3-5% of it. Let that sink in. But, what if the other 95% is too small for a human to fit in. I don’t think anyone would not call that part of the cave.

But here’s the real question. So we’ve mapped the cave. We know where the passages (i.e. lack of rock) are.  We find a plug of mud and remove that.  We’ve made more cave! Yeah! But what if we remove ALL the rock around the existing passage. When does the cave disappear? I mean now we just have a lot more “absence of rock”.  But I think we’d agree at some point we no longer have a cave!

So back to Swiss cheese.  One of the distinguishing details of such cheese are the holes, or more properly named the eyes. Did you know there’s actual Federal guidelines on what can be called Swiss cheese. Ayup, you can’t simply have a cheese with eyes in it. So I guess Swiss cheese is sort of like a cave. We actually have to think about it to give it some definition we can agree on.  Take away all the cheese, eyes and all, and you have no more cheese and I’m quite sad.

But what about accidents? Well, there’s a model of risk analysis called the Swiss cheese model. Basically, very few accidents occur out of the blue or entirely without a relation to other factors. The idea is you have multiple slices of Swiss cheese and all the holes have to line up for the accident to occur. For example, in my own personal experience, years ago I came close to all the “pieces” of the cheese lining up; while driving through New Jersey, I came fairly close to hydroplaning off an exit ramp into the woods.  Let’s look at some of the slices of cheese that came into play.

  • I was tired. Had I been more awake I’d have been paying a bit more attention.
  • It was dark. I might have noticed exactly how wet the exit ramp was during daylight.
  • I was travelling too fast.
  • I had nearly missed the ramp, I might have been travelling slower (see above) had I noticed the ramp sooner.

The instant I hit the ramp, I knew I was in trouble. I think the ONE slice that didn’t line up was, experience. Had I been 20 years younger with less experience driving, I suspect I’d have ended up off the road. I was at the very edge of being able to brake and maneuver and I called upon all my years of experience to stay on the correct side of that edge. One thin slice of “cheese” saved me that night.

When one looks through accident reports, of almost any industry or activity, one can start to look for where the slices lined up and how any one could be changed. One reason I read the American Cave Accidents report when I receive it is to learn where the slices could have been moved so I can make sure I don’t line up my slices of cheese.

So, the question for you is where do your slices of cheese line up?

And other question is, what sort of cheese do you put on YOUR roast beef sandwich? And do you make sure your Swiss cheese eyes don’t line up so every bite is ensured a bit of cheese?

 

 

 

 

Copying a Large File

It was a pretty simple request actually. “Can you copy over the Panama database from FOO\WAS_21 to server BAR\LAX_45?”

“Sure, no problem.”

Of course it was a problem.  Here’s the issue. This is at one of my clients. They have a couple of datacenters and have hundreds of servers in each.  In addition, they have servers in different AD domains.  This helps them partition functionality and security requirements. Normally copying files between servers within a datacenter isn’t an issue. Even copying files between the different domains in the same datacenter isn’t normally too bad. To be clear, it’s not great.  Between servers in the same domain, it appears they have 1GB connections, between the domains, the firewall seems to throttle stuff down to 100MB.

The problem is when copying between different domains in different datacenters. This can be abysmally slow. That was my problem this week.  WAS_21 and LAX_45 were in different datacenters, and in different domains.

Now, for small files, I can use the cut and paste functionality built into RDP and simply cut and paste. This doesn’t work for large files. The file in this case was 19GB.  So this was out.

Fortunately, through the Citrix VDI they provide, I have a temp folder I can use. So, easily enough, I could copy the 19GB file from FOO\WAS_21 to that. That took just a few minutes.  Then I tried to copy it from there to BAR\LAX_45. This was slow, but looked like it would work.  It was going to take 4-5 hours, but they didn’t need the file for a week.

After about 4.5 hours, my RDP session locked up. I logged out and back in and saw the copy had failed. I tried again. This time at just under 4.5 hours I noticed an out of memory error. And then my session locked up.

So, apparently this wasn’t going to work. The obvious solution was to split the file (it was already compressed) into multiple files; except I’m not allowed to install most software on the servers. So that wasn’t a great option. I probably could have installed something like 7zip and then uninstalled it, but I didn’t want to deal with that and the paperwork that would result.

So I fell back to an old friend: Robocopy.  This appeared to be working great. Up until about 4.5 hours.  And guess what… another out of memory error.

But I LIKE challenges like this.

So I looked more closely. Robocopy has a lot of options. There are two that stuck out: /Z – restartable mode. That looked good. I figured worst case, I’d start my backup, let it fail at about 85% done and then resume it.

But then the holy grail: /J :: copy using unbuffered I/O (recommended for large files). 

Wow… unbuffered… that looks good. Might use less memory.

So I gambled and tried both.  And low and behold, 4:19 later… the file was copied!

So, it was an annoying problem but… I had solved it.  I like that!

So the take-away: Don’t give up. There’s always a way if you’re creative enough!

Math is Hard, Let’s Go Shopping

If I were to ask my readers to take a math test right now, approximately 1/2 would perform worse than if I had used a more neutral title such as “Math Quiz Below”. I’ll let you as a the reader guess which 1/2.

This is a subtle form of priming. Multiple studies have shown that by priming people before taking tests or making decisions, we can influence their outcome. It isn’t quite subliminal advertising, but it can be close.

I’m currently reading Delusions of Gender by Cordelia Fine and it’s quite the read. I recommend it to my audience here. She goes into the studies showing how priming can impact outcomes and references them in more detail.

Overall, we know that women are less represented in STEM fields, but this lack of representation doesn’t start out this way. Studies show in grade school the interest in STEM by gender is about equal. But over time, there’s less representation of women in most STEM fields and often when they are represented, their positions either carry less weight (not as much advancement) or perceived to carry less weight (ignored, spoken over, etc.) And before anyone comments, “but I know a woman who is a CTO at my company” or similar, keep in mind that those are noteworthy because they are the exceptions, not the norm.

Now, no single solution will solve the problem of women’s representation in STEM. But there are things we can do. First, we need to recognize that the human brain is probably built to be primed for certain responses. But don’t confuse this with saying that we can’t change what we’re primed for or how we respond. And, we can also avoid priming.

One study that is cited by Fine appears to suggest that collecting gender-biased demographic data AFTER a test or survey doesn’t cause a gender based result in the test. In other words, if you simply give a math test and then at the end ask questions like gender, or even to put ones name on it (which can often have a influence on self-perception) it appears to remove the bias towards poorer performance by women.  Similarly if you don’t ask at all.

But, most of us aren’t giving math tests are we?

But we are doing things like looking at resumes, deciding what conferences or seminars to attend, what blogs to read or respond it and how we interact with our coworkers and bosses.

One technique to consider is blind recruitment. Here much if not all demographic data is removed from a resume. This sort of work goes back to the Toronto Symphony Orchestra in the 1970s. But note, there is some evidence that it’s not the panacea some make it out to be. So proceed with caution.

When attending a conference or seminar, you can do one of a few things. For one, try to read the session descriptions without seeing the name of who is presenting. This can be a bit hard to do and may not quite get the results you want. Or, and I’m going to go out on a limb here because some people find this concept a bit sexist and I don’t have a great deal of data to support it, but…. go based on the names, and select sessions where woman are presenting. Yes, I’m suggesting making a conscious, some would say sexist, choice.

So far I’ve been pleasantly pleased by doing so. Over a year ago at SQL Saturday Chicago 2017 I decided to attend a session by Rie Irish called Let Her Finish: Supporting Women’s Voices from meetings to the board room. I’d like to say I was surprised to find that I was only 1 of 2 men in the room, but I wasn’t. I was a bit disappointed however, since really it was men who needed to hear the talk more than women.  Oh and the other gentleman, was a friend of Rie’s she had invited to attend. And a related tip, when attending such topics, generally, KEEP YOUR MOUTH SHUT. But that’s a different blog post for a different time.

Other great talks I’ve heard were Mindy Curnutt‘s talk at SQL Summit 2017 on Imposter Syndrome. Or Deborah Melkin’s Back to the Basics: T-SQL 101 at SQL Saturday Albany 2017. Despite her being a first time speaker and it being a 101 class, it was great and I learned some stuff and ended up inviting her to speak at our local user group in February of this year.

Besides making your fellow DBAs, SQL professionals, IT folks etc feel valuable and appreciated, you’re also showing the event coordinators that their selections were well made. If more people attend more sessions given by women, eventually there will be more women presenting simply because more will be asked to present.

But what if you can’t go?  Encourage others. Rie and her partner Kathi Kellenberger (whom I’m indebted to for encouraging me to write my first book) are the leaders of the PASS WIT (Women in Technology) Virtual Chapter of PASS. Generally before a SQL Saturday they will retweet announcements of the various women speaking. It doesn’t hurt for you to do the same, especially for women that you know and have heard speak.

But what about when there are no women, or they’re poorly represented. Call folks out on it. Within the past year we’ve seen a “Women in Math” poster, which featured no women.  There was a conference in Europe recently (I’m trying to find links) where women were extremely underrepresented. When women AND men finally started to speak out and threaten not to attend or speak, the conference seemingly suddenly found more women qualified to speak.

I’ve heard sometimes that “it’s hard to find women speakers” or “women don’t apply to speak”. The first is a sign of laziness. I can tell you right now, at least in the SQL world, it’s not hard. You just have to look around.  In the second case, there may be some truth to that. Sometimes you have to be more proactive in making sure that women are willing to apply and speak. For my SQL/PASS folks out there, I would suggest reaching to Rie and Kathi and finding out what you can do to help attract speakers to your conference or user group. Also, for example, if you don’t already have women speaking or in visible public positions within your organization, this can discourage women from applying because, rightly or wrongly, you’re giving off a signal that women may not be welcome.

Math may be hard, but it should not be because of gender bias, and we shouldn’t let gender bias, primed or not allow under-representation to occur.

PS – bonus points if anyone can recognize the mountains in the photo at the top.

PPS – Some of the links below may end up outdated but:

And that’s just a small sampling of who is out there!

 

Alarming

So a recent trip to the ER (no, nothing serious, wasn’t me, thanks for asking) reminded me of a topic near and dear to my heart: Alarms and Alerts. What prompted this thought was the number of beeps, boops, and chirps I heard while there that no one responded to.  This leads to the question: Why have them, if no one responds to them?

I have a simple rule for alarms: “Don’t put an alert on something unless you have a response pre-planned for it.”

This is actually more complex than it sounds. And it can sometimes lead to seemingly illogical conclusions if you follow it in a reductio ad absurdum fashion.

Let’s start with an example of one alert I heard while sitting waiting. It was a constant beep, about 90 times a minute. I soon tracked it down to a portable monitor attached to a patient that was soon to be moved upstairs.  It was the person’s pulse.  Besides a possible HIPAA violation (I was now in theory privy to private medical information) it really served no purpose other than to annoy the patient and those around them. “But Greg, perhaps they were afraid the patient would suddenly go into cardiac arrest or something else would happen.”  And I agree, but then let’s alert on the sudden change in conditions, not in what was, at the time, a stable pulse for the patient. This beeping went on for over 10 minutes. And no one was monitoring it, other than the patient and us annoyed strangers.

So, there was an alert, that apparently needed no response.

But let’s go to the other extreme. What about when an alert isn’t needed. Let’s say you’re driving your car and it throws a rod. (Yes, this happened to me once, well I wasn’t driving, my father was. It was his sister’s Volkswagon campervan). I can tell you there is NO alert when such an event happens. But, there’s no need for it. The vehicle stops. It won’t go. So an alert in that case is pretty superfluous.

But let’s tie this to IT. I’m going to give you an absurd example of when not to have an alert: When you run out of disk space.  Again, you might disagree. You’d think this would be the perfect time to have an alert. But go back to my rule. What if you have no plan for this? You’ve never gamed out the possibility.  Now, you’re out of disk space. You don’t have a plan. Does it really matter if you had an alert or not? If you can’t respond, the alert really hasn’t added anything.

The main lesson to take away from that example is, if you’re setting up an alert, make sure you do have a plan. (The other lesson of course is perhaps to have an alert BEFORE you run out of disk space!) The plan may be as simple as, “delete as many files as I can”. But of course that only works if you have files to delete. Or it might be “add another filegroup to the database for now and then figure out the long-term solution during our next planned outage.”  Or, in the worst case it might be, “update my resume.”  But the point is, if you have an alert, have SOME plan for it.

On the flip side, how many times do you have an alert that you look at and say, “oh yeah, we can ignore that, that always happens.”  Sure, that’s a plan, but honestly, ask yourself, do you need an alert in that case? Probably not. I hate getting woken up at 2:00 AM for an alert I don’t need to respond to.  So in this case if there is no plan because you don’t need a plan, eliminate the alert.

I could go on (and perhaps this will be a good topic for my next book) but I’ll add one last real-world case where people all to often ignore alerts: smoke and CO detectors; especially CO detectors.  If you have a CO detector and it alerts, do NOT assume it’s faulty and unplug it. Respond. Somehow. Don’t automatically assume it’s a faulty battery, especially if it’s the winter. If you have any doubt, please call the fire department. Trust me, they’d much rather respond to a call where you’re all alive and it’s a false CO alarm than to show up and find the alarm going off, but everyone is now dead.

So the take away is, alerts are only useful if they generate a useful response.

Oh and because the inner child in me can’t resist: be a lert because the world needs more lerts! 🙂

 

 

Wet Paint

There’s an old saw that if you tell someone there’s 1 billion people in China, they’ll believe you, but if you put up a wet paint sign, they’ll have to touch the paint to be sure. I find it interesting that we believe some things easily and others not so easily.  Or even sometimes, we may think we believe something until we actually experience it. Then we somehow believe it even more.

Two incidents of this nature occurred to me within the last year.  Last year I drove to my uncle’s house in South Carolina in order to observe the total eclipse. Prior to totality, one of the phenomena that I knew would happen was seeing crescent shaped “shadows”. (Really it’s sort of the reverse since you’re seeing crescent shaped light.) I mean I had read about it, I had seen pictures, and I basically understood why they would occur.  And yet, at one point I was walking back into the house to do something when I looked down and lo and behold… I saw crescent shaped “shadows”.  My reaction was one of “Holy Cow, this really DOES happen!”  Now, I had no doubt intellectually that it should happen, or that I would most likely see it. But, actually seeing it was still amazing and while I really had no reason to have to verify the phenomenon, the fact that I personally had experienced it was incredible. I understood it at a visceral level, not just an intellectual one.

The second incident occurred just the other day. As a long-time sufferer of allergies, spring time has sometimes been a bit less than comfortable for me. And over the years, I had seen videos of pine trees releasing their pollen in a massive cloud (I’m thankfully NOT allergic to pine pollen it appears) but I had never actually experienced it myself. So again, I knew intellectually it happens.

So two days ago I was sitting outside looking at a pine tree when suddenly I saw this puff of what looked like smoke, and then a cloud of pollen waft away from the tree. Again, I had experienced that Holy Cow moment when the visceral experience matched the intellectual one. It was pretty cool.

That all said, I have stopped touching wet paint at this point in my life. But I still love these sort of confirming experiences (though I’m not eager to start counting heads in China at this time).

What facts did you know that when you finally experienced them first hand, had an impact upon you. I’d love to hear them.