Choices

“If you choose not to decide You still have made a choice” – Rush Freewill

One of the things that we believe makes us uniquely human is the concept of freewill; that we can rise above our base instincts and make choices based on things other than pure instinct. While there’s some question if that’s unique to humans, let’s stick with it for now.

Overall, we think choice is good. I can choose to eat cake for breakfast, or I can choose to eat a healthy breakfast. I can choose if I want get up early and exercise, or sleep in.

Sometimes we may think it’s hard to decide between two such things as in the examples above, but the truth is, it’s not that hard.

But, what happens when the choices aren’t nearly as simple. What happens when we sit down with a menu with 3 items versus 30 or even 100? We can become paralyzed. With 3 options, our odds of making a “wrong” decision is only 66%. I say “wrong”because it’s often purely subjective and may not necessarily have much impact.  But when we have 100 different things to choose from, the odds of a “wrong” decision goes up to 99%. In other words, we’re faced with the concept that no matter what we do, we’re virtually guaranteed to make a “wrong” decision.

The Jam Experiment

One example of this effect was seen in what is often called the jam experiment. Simply put, when given the choice of 6 varieties of jam, consumers showed a bit less interest, but sales were higher. When the choice of 24 jams were presented, there was more interest, but sales actually dropped, significantly. People were apparently paralyzed by having too many choices.

Locally there’s an outdoor hamburger/hot-dog stand I like to frequent called Jack’s Drive In. People will stand in long lines, in all sorts of weather (especially on opening day, like this year when the line was 20 people deep and with the windchill it was probably about 20F!) One can quibble over the quality of the burgers and fries, but there’s no doubt they do a booming business. And part of the reason is because they have few choices and keep the line moving.  This makes it far faster for people to order and faster to cook.  With only a few choices, patrons don’t spend 5 minutes dithering over a menu.

Hint: If you’re ever in the area, simply tell them you want “Two burgers and a small french”.  Second hint: No matter how hungry you are, don’t as a former co-worker once did, try “6 Burgers and a large french”. You will regret that particular choice.

Choices to Europe

What brought this particular post on was all the choices I’m facing in trying to plan our family vacation. It’s rather simple really, “we want to visit Europe”. But, I also am hoping to speak at SQL Saturday in Manchester, UK. And we want to visit London (where my cousin lives) and Paris. And we can fly out of the NYC area. Or Boston. Or possibly other areas if the price was cheaper enough.  So suddenly what one would hope is a simple thing becomes very complicated. And of course every airline has their own website design, which complicates things.

Of course the simple choice would be not to fly. The second simplest would be not to care about cost.  Of course neither of those work. So, I’m stuck in deciding between 24 types of jam. Wish me luck!

Getting Unlost

There’s a concept I teach people when I teach outdoor skills. If you’re going to be wrong, be confidently wrong. There’s two reasons for this. For one, people are more likely to follow a leader who appears to be confident and knows what they’re doing. This can lead to better group dynamics and a better outcome.

But the second, for example, if you’re lost is, if for whatever reason you choose NOT to stay in one place (which by the way is often the best choice, especially for children) is that if you make a plan and stick to it, you’re far more likely to get unlost. This isn’t just wishful thinking.

Imagine you’re lost and you decide, “I’m going to hike North!”  And you start to hike north, and after 15 minutes you decide, “eh maybe that was the wrong decision. I should hike East!” And you do this for another 15 minutes, and then you decide, “Nah, now that I think about it, South is much better.” 15 minutes later you decide you’re going to the wrong way and West was the right way all along.  An hour later, you’re back where you started. But, if you had decided to stick with North the entire time, an hour later, depending on your pace, terrain and other factors, you could be 2-4 miles further north. “So what?” you might ask. Well, take a look at a map of almost any part of the country.  In most cases you’re less than 10 miles from some sort of road.  If you’ve spent 3 hours hiking, in a single direction, you’ve probably hit a road, or a powerline or some other sign of civilization. (note this is NOT advice to wander in the woods if you get loss or a promise this will work anyplace. There are definitely places in the US this advice is bad advice).  Also obviously, if you hit a gorge or other impassible geologic feature, you may have to change directions. Or you might get another clue (like hearing a chainsaw or engine or something human-caused in a specific direction).

Final Thoughts

So, if you’re going to make a choice, make it confidently. And don’t second-guess yourself until new, solid reasons come along.

So, keep your choices simple and stick to them.

And with that, I choose to stop typing now.

 

Fail-safes

Dam it Jim, I’m a Doctor, not a civil engineer

I grew up near a small hydro-electric dam in CT. I was fascinated by it (and still am in many ways). One of the details I found interesting was that on top of this concrete structure they had what I later found are often called flashboards. These were 2x8s (perhaps a bit wider) running the length of the top of the dam, held in place by wooden supports.  The general idea was they increased the pooling depth by 8″ or so, but in the advent of a very heavy water flow or flood, they could be easily removed (in many cases removed simply by the force of the water itself).  They safely provided more water, but were designed in fact to fail (i.e. give away) in a safe and predictable manner.

This is an important detail that some designers of systems often don’t think about; how to fail. They spend so much time trying to PREVENT a failure, they don’t think about how the system will react in the EVENT of a failure. Properly designed systems assume that at some point failure IS an not only an option, it’s inevitable.

When I was first taught rigging for cave rescue, we were always taught “Have a mainline and a belay”.  The assumption is, that the system may fail. So, we spent a lot of time learning how to design a good belay system. The thinking has changed a bit these days, often we’re as likely to have TWO “mainlines” and switch between them, but the general concept is still the same, in the event of a failure EITHER line should be able to catch the load safely and be able to recover. (i.e. simply catching the fall but not being able to resume operations is insufficient.)

So, your systems. Do you think about failures and recovery?

Let me tell you about the one that prompted this post.  Years ago, for a client I built a log-shipping backup system for them. It uses SSH and other tools to get the files from their office to the corporate datacenter.  Because of the network setup, I can’t use the built-in SQL Server log-shipping copy commands.

But that’s not the real point. The real point is… “stuff happens”. Sometimes the network connection dies. Sometimes the copy hangs, or they reboot the server in the office in the middle of a copy, etc. Basically “things break”.

And, there’s another problem I have NOT been able to fix, that only started about 2 years ago (so for about 5 years it was not a problem.) Basically the SQL Server in the datacenter starts to have a memory leak and applying the log-files fails and I start to get errors.

Now, I HATE error emails. When this system fails, I can easily get like 60 an hour (every database, 4 times an hour plus a few other error emails). That’s annoying.

AND it was costing the customer every time I had to go in and fix things.

So, on the receiving side I setup a job to restart SQL Server and Agent every 12 hours (if we ever go into production we’ll have to solve the memory leak, but at this time we’ve decided it’s such a low priority as to not bother, and since it’s related to the log-shipping and if we failed over we’d be turning off log-shipping, it’s considered even less of an issue). This job comes in handy a bit later in the story.

Now, on the SENDING side, as I’ve said, sometimes the network would fail, they’d reboot in the middle of a copy or something random would make the copy job get “stuck”. This meant rather than simply failing, it would keep running, but not doing anything.

So, I eventually enabled a “deadman’s switch” in this job. If it runs for more than 12 hours, it will kill itself so that it can run normally again at the next scheduled time.

Now, here’s what often happens. The job will get stuck. I’ll start to get email alerts from the datacenter that it has been too long since logfiles have been applied. I’ll go in to the office server, kill the job and then manually run it. Then I’ll go into the datacenter, and make sure the jobs there are running.  It works and doesn’t take long. But, it takes time and I have to charge the customer.

So, this weekend…

the job on the office server got stuck. So I decided to test my failsafes/deadman switches.

I turned off SQL Agent in the datacenter, knowing that later that night my “cycle” job would turn it back on. This was simply so I wouldn’t get flooded with emails.

And, I left the stuck job in the office as is. I wanted to confirm the deadman’s switch would kick in and kill it and then restart it.

Sure enough later that day, the log files started flowing to the datacenter as expected.

Then a few hours later the SQL Agent in the datacenter started up again and log-shipping picked up where it left off.

So, basically I had an end to end test that when something breaks, on either end, the system can recover without human intervention. That’s pretty reassuring. I like knowing it’s that robust.

Failures Happen

And in this case… I’ve tested the system and it can handle them. That lets me sleep better at night.

Can your systems handle failure robustly?

 

 

SQL Data Partners Podcast

I’ve been keeping mum about this for a few weeks, but I’ve been excited about it. A couple of months ago, Carlos L Chacon from SQL Data Partners reached out to me about the possibility of being interviewed for their podcast. I immediately said yes. I mean, hey, it’s free marketing, right?  More seriously, I said yes because when a member of my #SQLFamily asks for help or to help, my immediate response is to say yes.  And of course it sounded like fun.  And boy was I right!

What had apparently caught Carlos’s attention was my book: IT Disaster Response: Lessons Learned in the Field.  (quick go order a copy now.. that’s what Amazon Prime is for, right?  I’ll wait).

Ok, back? Great. Anyway, the book is sort of a mash-up (to use the common lingo these days) of my interests in IT and cave rescue and plane crashes. I try to combine the skills, lessons learned, and tools from one area and apply them to other areas. I’ve been told it’s a good read. I like to think so, but I’ll let you judge for yourself. Anyway, back to the podcast.

So we recorded the podcast back in January. Carlos and his partner Steve Stedman were on their end and I on mine. And I can tell you, it was a LOT of fun. You can (and should) listen to it here.  I just re-listened to it myself to remind myself of what we covered. What I found remarkable was the fact that as much as I was really trying to tie it back to databases, Carlos and Steve seemed as much interested, if not more in cave rescue itself. I was ok with that.  I personally think we covered a lot of ground in the 30 or so minutes we talked. And it was great because this is exactly the sort of presentation, combined  with my air plane crash one and others I’m looking to build into a full-day onsite consult.

One detail I had forgotten about in the podcast was the #SQLFamily questions at the end. I still think I’d love to fly because it’s cool, but teleportation would be useful too.

So, Carlos and Steve, a huge thank you for asking me to participate and for letting me ramble on about one of my interests.  As I understand it my Ray Kim has a similar podcast with them coming up in the near future also.

So thought for the day is, think how skills you learn elsewhere can be applied to your current responsibilities. It might surprise you and you might do a better job.

 

 

 

What a Lucky Man He Was….

Being a child of the 60s my musical tastes run the gamut from The Doors through Rachel Platten.  In this case, the title of course comes from the ELP song.

Anyway, today’s post is a bit more reflective than some. Since yesterday I’ve been fighting what should be simple code. Years back I wrote a simple website to handle student information for the National Cave Rescue Commission (NCRC).  The previous database manager had started with a database designed back in the 80s. It was certainly NOT web friendly. So after some work I decided it was time to make it a bit more accessible to other folks.  Fortunately ASP.NET made much of the work fairly easy.  It did what I wanted to do. But now, I’m struggling to figure out how to get and save profile information along with membership info.  Long story short, due to a design decision years back, this isn’t as automatic and easy as I’d like.  So, I’ve been banging my head against the keyboard quite a bit over the last 24 hours. It’s quite frustrating actually.

So, why do I consider myself lucky? Because I can take the time to work on this. Through years of hard work, education and honestly a bit of luck, I’m at the point where my wife and I can provide for our family to live a comfortable life and I can get away with working less than a full 40 hours a week. This is important to me as I get older. Quality of life becomes important.

I’ve talked about my involvement in cave rescue in the past and part of that is wearing of multiple hats. Some of which take more work than others.

I am for example:

  • Co-captain of the Albany-Schoharie Cave Rescue Team – This is VERY sporadic and really sort of unofficial and some years we will have no rescues at all locally.
  • I’m an Instructor with the NCRC – This means generally a week plus a few days every year I take time out to travel, at my own expense to a different part of the country and teach students the skills required to be effective in a cave rescue. For this, I get satisfaction. I don’t get paid and like I say I travel at my own expense.  Locally I generally take a weekend or two a year to teach a weekend course.
  • I’m a Regional Coordinator with the NCRC – Among other things this means again I travel at my own expense once a year, generally to Georgia, to meet with my fellow coordinators so we can conduct the business of the NCRC. This may include approving curriculum created by others, reviewing budgets and other business.
  • Finally, I’m the Database Coordinator. It’s really a bit more of IT Coordinator but the title is what it is. This means not only do I develop the database and the front end, I’m responsible for inputting data and running reports.

As you can see, this time adds up, quickly.  I’d say easily, in terms of total time, I dedicate a minimum of two weeks a year to the NCRC.  But it’s worth it. I can literally point at rescues and say, “those people are alive because of the work I and others do”. Sometimes it’s direct like when I’m actually on a rescue, sometimes it’s indirect when I know it’s a result of the training I and others have provided.  But it’s worth it.  I honestly can claim I work with some of the best people in the world. There are people here that I would literally put my life on the line for; in part because I know they’d do the same.

So, I’m lucky. I’m lucky that I can invest so much of my time in something I enjoy and love so much.  I’ll figure out this code and I’ll keep contributing, because it’s worth it, and because I’m lucky enough that I can.

How are you lucky?

 

 

Crane Operators

Talking online with friends the other day, someone brought up that crane operators in NYC can make $400-$500K a year. Yes, a year. I figured I’d confirm that before writing this post and it appears to be accurate.

At first glance one may think this is outrageous, or perhaps they chose the wrong field. I mean I enjoy being a DBA and a disaster geek, but I can’t say I’ve ever made $400K in one year!  And for what, I mean you lift things up and them down. Right?

Let me come back to that.

So, last night, I got paid quite a tidy bundle (but not nearly that much) for literally logging into a client computer, opening up VisualCron and clicking on a task and saying, “disable task”. On one hand, it seemed ridiculous;  not just because of what they were paying me, but because this process was the result of several meetings, more than one email and a review process.  All to say, “stop copying this file.”

But, this file was part of a key backup process for a core part of the client’s business. I had initially setup an entire process to ensure that a backup was being copied from an AIX server in one datacenter to a local NAS and then to the remote datacenter.  It is a bit more complex than it sounds.  But it worked. And the loss of a timely backup would impact their ability to recover by hours if not days. This could potentially cost them 100s of thousands of dollars if not into the millions.

So the meetings and phonecalls and emails weren’t just “which button should Greg click” but covered questions like, “do we have the backups we think we have?” “Are they getting to the right place(s)?” “Are they getting there in a timely fashion?”  And even, “when we uncheck this, we need to make sure the process for the day is complete and we don’t break it.”

So, me unchecking that button after hours, as much as it cost the company was really the end of a complex chain of events designed to make sure that they didn’t risk losing a LOT of money if things went wrong. Call it an insurance payment if you will.

Those crane operators in NYC? They’re not simply lifting up a beam here and there and randomly placing it someplace. They’re maneuvering complex systems in tight spaces with heavy loads where sudden gusts can set things swaying or spinning and a single mistake can do $1000s in damage or even kill people.

It’s not such much what they’re being paid to do, as much as how much they are being paid to avoid the cost of a mistake. I wasn’t paid just to unclick a button. I was paid (as were the others in the meetings) to make sure it was the right button and at the right time and that it wouldn’t cost even more.

Sometimes we’re not paid for what we do, as much as we’re paid for what we’re not doing.

 

Starting off the New Year… wrong

Ok, I had plans for a great blog post on time and how we measure it and how it really is arbitrary. But… that will have to wait. (I suppose that brings its own irony about time and waiting and all that. Hmm, now I may need a post about that!)

But, circumstances suggested a different post.

This one started with an email from my boss last year (who will remain nameless for his sake. 🙂  “Greg, I hacked together this web page so my team-members can track their hours. I need you to make a few tweaks to it. And don’t worry, we’ll replace it by the end of the year.”  Yes, that sound you hear is your head hitting the desk like mine, because you KNOW what is coming next. Here we are in January of 2018 and guess what, the web page is still in place.

And of course, it’s not just HIS (now former, he’s moved on) team that is using it. Apparently his boss liked what it could do, so his boss (my boss^2) declared that ALL his teams would use it.  So instead of just 4-5 people using it, there’s close to 30-40 people using it.

So, remember the first rule of development, “Oh don’t worry about this, we’ll replace it soon” never happens.  His original code made a lot of assumptions (which at the time I suppose seemed reasonable). For example, since many schedules are formed around work-weeks, he was originally storing hours by the week (and then day of the week) they were worked in. For example, if someone worked for 6 hours on January 5th of last year, the table stored that as “WEEK 1, day 5”.

Now, before anyone completely thinks this is nuts, do keep in mind, many places, including my last job at GE did everything by Week of the Year.  There’s some advantages to this (that I won’t go into here).

For the most part, the code worked and I didn’t care. But, at one point I was asked to do a report based not on Work Weeks, but on an actual calendar month. So, I had to figure out for example that February 1st occurred during Work Week 5, on the 4th day. And go from there.

So, I hacked that together.  Then where was a request for another report and another. Pretty soon it was getting to be a pain to keep track of all this. And I realized another problem. My boss had gotten lucky in 2017. January 1st occurred on a Sunday, which meant that Work Week 1, Day 1 was a natural starting point.

I started to worry about what would happen when 2018 rolled around. His code would probably break.  So I finally took the plunge and started to refactor the code.  I also added new features as they were requested.  Things were going great.

Until yesterday.  So now we get to another good rule of development: “Use Version Control” So simple. Even for a one person shop like me it’s a good idea. I had put it off for awhile, but finally downloaded the git plug-in for Visual Studio and started to use it.

So yesterday, a user reported that they couldn’t enter hours for last week. I pulled up the app and realized that yes, in fact it was coded in such a way you couldn’t go back into a previous year to enter hours. No problem, I can fix that!

Well, let me add a rule I had sort of missed; “Understand HOW to use Version Control”. I won’t go into details, but let’s just say, I wasn’t committing like I should have been or thought I was. So, in an attempt to get a clean base and all, I merged things… “poorly“. I had the branches I wanted, but had not properly committed stuff to them.

The work of the last few weeks was GONE. I know, you’re saying, “just go to your backups Greg, because of course a person who writes about DR has proper backups, right?”  Yeah, right! Let’s not go there!

Anyway, I spent 4 hours last night and 2 this morning recreating the code. Fortunately, it dawned on me, being .NET code, it was really just CLR and that perhaps with a good decompiler, I could get most of the code back without too much effort. I want to give props to RedGate’s .NET Reflector. Of the two decompilers I tried, it was clearly the better one. While I lost my variable names and the decompiled code isn’t quite what I’d call human written, it was close enough I could in short order recreate hours and hours of work I had done.

In the meantime, I also talked to some of my programming buddies on an RPI chat server (ask me about it sometime) about git and better procedures.

And here’s where I realized my fundamental mistake. It wasn’t just misunderstanding how Visual Studio was interacting with git and the branches I was creating, it was that somewhere in the back of my head I had decided that commits should be used only when major steps were completed. It was almost like I was afraid I’d run out; or perhaps complicate the history with too many of them.  But, you know what? Commits aren’t scarce resources. And I’m the only one reading the history. I don’t really care if I’ve got 10 commits or 100. The more the better in many ways. Had I been making commits much more frequently, I’d have saved myself a lot of work. (And I can discuss having a remote store at some point, etc.)

So really, the point of this hastily, sleep-deprived written blog isn’t so much to talk about dates and apps that never get replaced, but about a much deeper problem I was having. It wasn’t just failing to fully understand git commands, it was failing to understand how I should be using git in the first place.

So, to riff on an old phrase, “Commit Early, Commit often”.

Oh and I did hack together a way for that user to enter hours for last year. It’s good enough. I’m sure I won’t need it a year from now. The code will be all replaced or refactored by then I’m sure!

 

Maps

This is actually about a bit more than simply maps. It’s really about what a map is. It’s an abstraction. This may sound obvious, but it’s an important concept.

“I have a map of the United States… Actual size. It says, ‘Scale: 1 mile = 1 mile.’ I spent last summer folding it. I hardly ever unroll it. People ask me where I live, and I say, ‘E6.” – Stephen Wright

Forget the impracticality of such a map, it makes a fine joke, it’s still just a map. We don’t live on maps.  Piaget and others have done a lot of research into the development of the minds of young children and it’s fascinating. At first, I think one can argue they live in a very solipsistic world. Nothing exists except what they can see or touch. Soon though concepts of object permanence come into being. But they’re still not quite at the concept of true abstraction. At very young ages, if you hold up a picture of a room to a small enough child, they’ll try to grab stuff of the picture. And why not? They can grab things in the physical world.  Soon they start to realize the picture is just that, not the actual room, but a flattened portable version of it.

Around age 5 (I’m going on memory here) we start to ask children to draw maps. It might be a map of their room, or a map of how to get to school. They’re crude and often, to our adult minds wildly inaccurate, but, they’re starting to achieve a truer concept of abstraction.  Their maps aren’t even miniaturized pictures of the real world, they’re completely new, abstract designs of the world.

Eventually, around 8th grade or so, they’re probably introduced to Algebra. Up until then much of their math might have been things like 8 * ___ = 56. They’ll dutifully fill in the 7. It’s a blank, but it’s not really abstract. Then they start to solve problems like “56/x = 8, solve for x”.  Again they’ll write down 7, but now they’ve learned that a letter can represent a number and be used that way later on.

What’s interesting about the above is, many people never stop to realize that * or / or + or many other mathematical symbols are really just abstractions. Yes, numbers are in a sense an abstraction. There’s nothing inherent about the symbol 5 that means it MUST represent five of an object. It’s just a convenience (and it’s shorter than writing ///// hash marks). But the mathematical symbols are even more of an abstraction. If I wrote 8♒︎__ = 56, most of us could probably figure out what ♒︎ represents there.

Anyway, that aside, why am I writing about maps and ultimately abstraction? Because, it is so vitally important to my job and perhaps the jobs of most of my readers and it’s been on my mind lately.

I’ve been working on an SSIS package for a client. The original spec was “Import these 20 or so (out of 240) CSV files out of this ZIP file.”  I thought about it and after teaching myself a bit more about SSIS (I don’t usually do much in it) I realized I could create a container that did most of the work. And I could assign variables to the container. The variables include things like what database connection to use and where the files might be located. And, most importantly, a variable for the NAME of the file/table (by using a table with the same basename as the file, I can use a single variable and line stuff up nicely.) And part of what I had to (re)-learn was the scope of variables within SSIS packages and how to expose the ones I wanted as parameters.

Now, honestly, it probably took me more time to set up this level of abstraction to make it workable than if, for the 20 or so tables I had hard-coded everything.

BUT, I’ve got enough experience to guess at what was coming next; and I’m going to guess you know what’s coming next.

“Hey, this is great, but you know what, we think we want all 240 files in that ZIP file loaded. Can yo do that?”

“Sure!”  And at this point, the problem isn’t the amount of work, it’s the tediousness of it; “create table schema, create a new file connector, copy the SSIS container, update the variable for the table, go in and reconnect the file connect to the new table, test”.  It’s a LOT less work than if I had to hardcode everything by hand.  The container name even automatically is updated based on the table variable. This actually has an unintended, positive side-effect. If for some reason I start to duplicate a container (say I forgot I had already imported table Customer_Val, the container name won’t update using the variable since you can’t have duplicate container names. I caught this very problem the other day. So this was an unintended, but very useful additional side effect!)

And of course, once I move the final code to their UAT environment and then the Prod environment, it’s simply a matter of updating 3-4 variables in the deploy (for the database server, file location and where to set emails) and I’m done.

By taking the time to abstract out as much as I could, I’ve made my life easier. Up front it was definitely a bit more work, but in the long run, has saved me a lot of effort.

Nothing revolutionary about the concept to us developers. BUT, stop and try to think of an environment where maybe variables or such layers of abstraction don’t exist. Such environments exist and there’s some valid reasons for them, but ultimately, being able to abstract stuff can make our lives MUCH easier and our code that much more powerful.

And it appears that abstraction at this level is perhaps the ONE real intellectual advantage we have over other intelligence on this planet. And I’m not even so sure about that!

So I’ll end in pointing out that even the written word is really just a level of abstraction. So this column is about itself. Or something like that.