And so it Happened…

Posted on March 27, 2018 by Greg Moore

New Faces

Last year I made a decision to try to do at least one SQL Saturday outside my “normal” geographic region; which basically encompasses down to Washington DC and out to Rochester NY and Boston. I’ve spoken at a number of SQL Saturdays in this area. I’ve enjoyed all of them. And generally I’ve drawn a decent audience, with a few exceptions.

But, one of the problems of doing that is you keep seeing the same speakers over and over. And while we’ve got a great crowd of speakers, I wanted to hear from speakers I might not normally hear from. Also, unless you’re constantly creating new content (which you should for a multitude of reasons), after awhile your possible audience has heard everything you have to say.

So last year, I put a bid in for SQL Saturday Chicago and was very pleased to be accepted. I had a great time staying with some friends in the area and also a great time at the Speakers Dinner and After Party as well as at the event itself. I met a number of speakers I had not met before and heard a few speaker that I had not previously heard. And, I had a fresh new audience who seemed to really enjoy my topic on “Tips that have saved my Bacon.”

Colorado Springs

So this year, I had a choice of places to put in bids for. I selected Colorado Springs and was pleasantly surprised to find they’d accepted me. Since I’ve got a friend in the area, that cut down on costs considerably. It was a win win.

I had a great time at the Speakers Dinner on Friday night and met more speakers that I had not previous met. A quick shout out to @toddkleinhans and Cyndi Johnson and @DBAKevlar among others. It was great. We talked a bit about using VR to navigate a query, about reprogramming our brains and more.

I was excited for the next day. Sure, it was last session of the day, but I showed up early so I could hang out in the Speakers’ Lounge, see some of the other sessions, and hang with my friends, the MidnightDBAs, Sean and Jenn McCown.

Then it happened

As a speaker you have a lot of fears; the slide deck crashing, your computer applying updates in the middle of your talk (it happens!) and more. But I think the one that perhaps you don’t necessarily dread the most, but you’re most disappointed by, is when…. no one shows up! Catherine Wilhelmsen has a great blog post about this and I have to agree with pretty much everything she says.

All I can say is… “it happens”. I know it’s happened to other speakers, many who I have a great deal of respect for and think are a tier above me in terms of their talks.

Sometimes it’s just luck of the draw. Sometimes, as I suspect played a role here, it’s the end of the day, a number of folks have gone home already and ALL the sessions have lower numbers than ones earlier in the day. It could be the organizers misjudged the topics the audience wanted. It could be my title or description just didn’t entice folks (I suspect this is part of the issue with a different talk I gave, where I got too cutesy with the title. I’ve changed the title and updated the description and I’m scheduled to present it again at another SQL Saturday. So at least the organizers there think it’ll draw folks.)

But overall, yeah, it’s frustrating, but a single talk doesn’t make or break me as a speaker. It happens and we move on.

Conclusion

It was still worth coming out to SQL Saturday Colorado Springs and I don’t regret it. I’m grateful to the organizers that gave me the opportunity. So thanks.

Oh and one more thing I noticed while going back through notes for this blog entry: SQL Saturday Chicago 2017 was event 600, Colorado Springs 2018 was 700. That’s 100 in a year, almost 2 a week. And I was asked to speak at (including Chicago) 6 of them I believe. That’s a pretty good percentage.

I’m content.

That said come see me next month at SQL Saturday Philadelphia! I’m not sure what time I’m scheduled for yet, but I’ll be speaking on “So you want to Present: Tips and Tricks of the Trade”. And yes, I will talk about when people don’t show up. That’s assuming I have an audience 🙂

Choices

Posted on March 20, 2018 by Greg Moore

“If you choose not to decide You still have made a choice” – Rush Freewill

One of the things that we believe makes us uniquely human is the concept of freewill; that we can rise above our base instincts and make choices based on things other than pure instinct. While there’s some question if that’s unique to humans, let’s stick with it for now.

Overall, we think choice is good. I can choose to eat cake for breakfast, or I can choose to eat a healthy breakfast. I can choose if I want get up early and exercise, or sleep in.

Sometimes we may think it’s hard to decide between two such things as in the examples above, but the truth is, it’s not that hard.

But, what happens when the choices aren’t nearly as simple. What happens when we sit down with a menu with 3 items versus 30 or even 100? We can become paralyzed. With 3 options, our odds of making a “wrong” decision is only 66%. I say “wrong”because it’s often purely subjective and may not necessarily have much impact. But when we have 100 different things to choose from, the odds of a “wrong” decision goes up to 99%. In other words, we’re faced with the concept that no matter what we do, we’re virtually guaranteed to make a “wrong” decision.

The Jam Experiment

One example of this effect was seen in what is often called the jam experiment. Simply put, when given the choice of 6 varieties of jam, consumers showed a bit less interest, but sales were higher. When the choice of 24 jams were presented, there was more interest, but sales actually dropped, significantly. People were apparently paralyzed by having too many choices.

Locally there’s an outdoor hamburger/hot-dog stand I like to frequent called Jack’s Drive In. People will stand in long lines, in all sorts of weather (especially on opening day, like this year when the line was 20 people deep and with the windchill it was probably about 20F!) One can quibble over the quality of the burgers and fries, but there’s no doubt they do a booming business. And part of the reason is because they have few choices and keep the line moving. This makes it far faster for people to order and faster to cook. With only a few choices, patrons don’t spend 5 minutes dithering over a menu.

Hint: If you’re ever in the area, simply tell them you want “Two burgers and a small french”. Second hint: No matter how hungry you are, don’t as a former co-worker once did, try “6 Burgers and a large french”. You will regret that particular choice.

Choices to Europe

What brought this particular post on was all the choices I’m facing in trying to plan our family vacation. It’s rather simple really, “we want to visit Europe”. But, I also am hoping to speak at SQL Saturday in Manchester, UK. And we want to visit London (where my cousin lives) and Paris. And we can fly out of the NYC area. Or Boston. Or possibly other areas if the price was cheaper enough. So suddenly what one would hope is a simple thing becomes very complicated. And of course every airline has their own website design, which complicates things.

Of course the simple choice would be not to fly. The second simplest would be not to care about cost. Of course neither of those work. So, I’m stuck in deciding between 24 types of jam. Wish me luck!

Getting Unlost

There’s a concept I teach people when I teach outdoor skills. If you’re going to be wrong, be confidently wrong. There’s two reasons for this. For one, people are more likely to follow a leader who appears to be confident and knows what they’re doing. This can lead to better group dynamics and a better outcome.

But the second, for example, if you’re lost is, if for whatever reason you choose NOT to stay in one place (which by the way is often the best choice, especially for children) is that if you make a plan and stick to it, you’re far more likely to get unlost. This isn’t just wishful thinking.

Imagine you’re lost and you decide, “I’m going to hike North!” And you start to hike north, and after 15 minutes you decide, “eh maybe that was the wrong decision. I should hike East!” And you do this for another 15 minutes, and then you decide, “Nah, now that I think about it, South is much better.” 15 minutes later you decide you’re going to the wrong way and West was the right way all along. An hour later, you’re back where you started. But, if you had decided to stick with North the entire time, an hour later, depending on your pace, terrain and other factors, you could be 2-4 miles further north. “So what?” you might ask. Well, take a look at a map of almost any part of the country. In most cases you’re less than 10 miles from some sort of road. If you’ve spent 3 hours hiking, in a single direction, you’ve probably hit a road, or a powerline or some other sign of civilization. (note this is NOT advice to wander in the woods if you get loss or a promise this will work anyplace. There are definitely places in the US this advice is bad advice). Also obviously, if you hit a gorge or other impassible geologic feature, you may have to change directions. Or you might get another clue (like hearing a chainsaw or engine or something human-caused in a specific direction).

Final Thoughts

So, if you’re going to make a choice, make it confidently. And don’t second-guess yourself until new, solid reasons come along.

So, keep your choices simple and stick to them.

And with that, I choose to stop typing now.

Fail-safes

Posted on March 13, 2018 by Greg Moore

Dam it Jim, I’m a Doctor, not a civil engineer

I grew up near a small hydro-electric dam in CT. I was fascinated by it (and still am in many ways). One of the details I found interesting was that on top of this concrete structure they had what I later found are often called flashboards. These were 2x8s (perhaps a bit wider) running the length of the top of the dam, held in place by wooden supports. The general idea was they increased the pooling depth by 8″ or so, but in the advent of a very heavy water flow or flood, they could be easily removed (in many cases removed simply by the force of the water itself). They safely provided more water, but were designed in fact to fail (i.e. give away) in a safe and predictable manner.

This is an important detail that some designers of systems often don’t think about; how to fail. They spend so much time trying to PREVENT a failure, they don’t think about how the system will react in the EVENT of a failure. Properly designed systems assume that at some point failure IS an not only an option, it’s inevitable.

When I was first taught rigging for cave rescue, we were always taught “Have a mainline and a belay”. The assumption is, that the system may fail. So, we spent a lot of time learning how to design a good belay system. The thinking has changed a bit these days, often we’re as likely to have TWO “mainlines” and switch between them, but the general concept is still the same, in the event of a failure EITHER line should be able to catch the load safely and be able to recover. (i.e. simply catching the fall but not being able to resume operations is insufficient.)

So, your systems. Do you think about failures and recovery?

Let me tell you about the one that prompted this post. Years ago, for a client I built a log-shipping backup system for them. It uses SSH and other tools to get the files from their office to the corporate datacenter. Because of the network setup, I can’t use the built-in SQL Server log-shipping copy commands.

But that’s not the real point. The real point is… “stuff happens”. Sometimes the network connection dies. Sometimes the copy hangs, or they reboot the server in the office in the middle of a copy, etc. Basically “things break”.

And, there’s another problem I have NOT been able to fix, that only started about 2 years ago (so for about 5 years it was not a problem.) Basically the SQL Server in the datacenter starts to have a memory leak and applying the log-files fails and I start to get errors.

Now, I HATE error emails. When this system fails, I can easily get like 60 an hour (every database, 4 times an hour plus a few other error emails). That’s annoying.

AND it was costing the customer every time I had to go in and fix things.

So, on the receiving side I setup a job to restart SQL Server and Agent every 12 hours (if we ever go into production we’ll have to solve the memory leak, but at this time we’ve decided it’s such a low priority as to not bother, and since it’s related to the log-shipping and if we failed over we’d be turning off log-shipping, it’s considered even less of an issue). This job comes in handy a bit later in the story.

Now, on the SENDING side, as I’ve said, sometimes the network would fail, they’d reboot in the middle of a copy or something random would make the copy job get “stuck”. This meant rather than simply failing, it would keep running, but not doing anything.

So, I eventually enabled a “deadman’s switch” in this job. If it runs for more than 12 hours, it will kill itself so that it can run normally again at the next scheduled time.

Now, here’s what often happens. The job will get stuck. I’ll start to get email alerts from the datacenter that it has been too long since logfiles have been applied. I’ll go in to the office server, kill the job and then manually run it. Then I’ll go into the datacenter, and make sure the jobs there are running. It works and doesn’t take long. But, it takes time and I have to charge the customer.

So, this weekend…

the job on the office server got stuck. So I decided to test my failsafes/deadman switches.

I turned off SQL Agent in the datacenter, knowing that later that night my “cycle” job would turn it back on. This was simply so I wouldn’t get flooded with emails.

And, I left the stuck job in the office as is. I wanted to confirm the deadman’s switch would kick in and kill it and then restart it.

Sure enough later that day, the log files started flowing to the datacenter as expected.

Then a few hours later the SQL Agent in the datacenter started up again and log-shipping picked up where it left off.

So, basically I had an end to end test that when something breaks, on either end, the system can recover without human intervention. That’s pretty reassuring. I like knowing it’s that robust.

Failures Happen

And in this case… I’ve tested the system and it can handle them. That lets me sleep better at night.

Can your systems handle failure robustly?

Things Left Unsaid

Posted on March 6, 2018 by Greg Moore

Pop Quiz

You show up at an accident scene and see two patients. One is screaming in pain about a broken arm. The other is propped up against the wall seemingly fine, not saying a word. Which one do you check out first?

Many will answer, “the one screaming in pain about the broken arm, the other person is fine.” The experienced responder will most likely check out the 2nd person. Why? Because they’re NOT saying a word.

Here’s the thing. You know the 1st person has a pulse and an airway. They’re breathing just fine. Perhaps a bit too fine. A broken arm, by itself isn’t going to kill them. But what about that 2nd person? Are they breathing? You don’t know. Perhaps they’re not saying a word because they’ve stopped breathing. If you take the time to splint the broken arm and then get to the 2nd person, they may have died. So, check out the 2nd patient first, then determine your course of action.

We’re Safe! Really, we are. Trust us, because we keep repeating it!

I saw this because in problem solving, I often find what’s NOT said is often far more important than what is said. Several years ago my son received a letter saying he had been nominated for a program that took children to other countries on basically extended field trips. It actually sounded really interesting. We went to the presentation. I sat through it thinking, “this is really cool.” But, two things struck me. First, they kept emphasizing how safe it was. At first pass, and the first time they mentioned it, I wasn’t bothered. I mean as a parent, you want to know your kid is going to be safe if you put them in the hands of strangers for an extended period of time. But, they kept emphasizing it. It got to the point that all three of us (my wife, my son and I) started to wonder, “why the hell are the dwelling on this point?”

The other thing that was bothersome was once we got out of the lecture hall and tried to speak to some of the individuals, we asked them “How did our son get nominated?” “Oh it must have been a teacher at his school.” Which sounded great until we thought about it and thought it strange that no teacher had mentioned this to us or our son.

So, when we got home, we did some digging and found out there had been several incidents of accidents happening to students while overseas with this group. On one hand, nothing struck me as too statistically terrible, but the reports of the handling and the fact that we were only reading about the ones reported made me even more paranoid about how unsafe the program really was. I mean why emphasize safety unless you really feel like you have to?

The other detail we uncovered was most parents had the same experience about “your child has been nominated” without any word of by whom. The most troubling was at least one or two parents who chimed in who said that their child had been killed in an accident or otherwise died after their name had appeared in the newspaper for being on the honor roll. i.e. a fact that a teacher who might be in a position to nominate the said child would be well aware of. As far as we and other parents could determine, the “nomination” process was solely a matter of the group scanning the newspapers for honor roll students and the like.

So, relating this back to IT

As a person who loves troubleshooting, one of the things I’ve learned is NOT to trust what the user initially reports to me. “I haven’t changed a thing and this stopped working!” That generally means, they changed something. 🙂

I once had a client, that had a problem that took at least two winters to diagnose. Why so long you might ask? Because the problem only happened in the winter. The first year it was complaints of “ever since you networked our computers, they reboot without warning.” Now, I had networked them several months previously and they only started to report the problem come the late fall/early winter. I tried several things, but nothing really fixed the problem. I had an idea of what it was, but they wouldn’t listen. So, among other things, I ended up rewiring their entire network (sounds like a lot of work, but it was a total of 4-5 computers and I moved from thinwire Ethernet to 10baseT (I did say this was a long time ago, right?)

Eventually I sort of gave up. Until the next winter rolled around and they started to call again. Again, I told them what I thought the problem was. Again, they dismissed it. I’m not sure what finally convinced them, but they finally took me up on my suggestion and put in a humidifier and had their office carpet treated with anti-static spray. Yes, despite all their instance that “I was just sitting there typing and it rebooted” what was really happening and they weren’t saying was, “I just walked from one office to the other, across the carpet, in the drier than normal air and as soon as I touched my computer it rebooted.” It was the static build-up all the time.

So this week’s moral of the story: Look beyond what’s being said and pay attention to what’s NOT being said. It might shock you.

Green Mountain Software

Formerly thoughts on SQL, Disasters and thinking. Now musings on PA School, and more.

Monthly Archives: March 2018