GIGO

A huge tenet of programming is GIGO: Garbage In/Garbage Out.

Years ago when I was practicing for a play (Night of January the 16th by Ayn Rand). I was the bailiff.  At one point in the play I’m handed a copy of a check that is evidence. I’m supposed to “read” what’s on the check. Of course since it’s a play, I have my lines memorized.

But during this dress rehearsal I’m given a piece of paper with actual writing on it. Unfortunately it was just some random writing. But my brain went into segfault and I stopped. Part of my brain wanted to read what’s on the piece of paper.  Part of my brain wanted to say my lines, but it could no longer remember them.

It was a perfect example of how easy it is to scramble the input for our brains.  In the actual performances we made sure the piece of paper was actually blank.

I was reminded of this the other night when Steve Harvey made his gaff on live television. I was curious how he could make such a mistake but I had my suspicions. And I was right.  The cue card apparently was VERY poorly designed and his visual input system (i.e. his eyes and brain) screwed up. Read here for more details. Bad input lead to bad output.

These are humorous examples, but in the software world, these can be very dangerous.

At one point during the shuttle program, they found an error where the arm thought it had rotated more than 360 degrees, a physical impossibility. This link has some details (though in my recollection the issue was not a rounding error but that the code went from 0-360 instead of 0-359 or 1-360).  Garbage in could have lead to potentially bad garbage out.

Much more recently however, here’s an example of intentional “garbage” in. This is part of the encryption software used on many firewalls. Your bank or other financial institution for example may be using this code.

Ironically true garbage, as in a purely random number, might be better. But here it seems someone poisoned the input with their specific number and then set it up to use the results in a dangerous manner. I say dangerous because the 3rd party using this code may not realize that they’re completely vulnerable to having all their data seen.  About the only thing worse than unencrypted data is data you think is encrypted but isn’t.  In the former, I’m probably going to pay far more attention to who has access.  I’ll add too that some of us suspect the NSA had a hand in this.

This is by the way I highly recommend folks don’t write their own encryption. Unless you’re an expert you’re liable to screw it up.

Moral: So be careful of your inputs, they definitely influence your outputs, both in code and in your brain.

Rolling in the Deep

I was at SQL Saturday in Boston this past weekend and I sat in on a session given by Paresh Motiwala: “Why do we shun using tools for DBA job?”

It’s a decent question and sometimes rolling your own (hence the title here and yes I’m listening to Adele right now) is the right answer.

But often, it’s not.

A case in point are DBAs who avoid using the built-in SQL Server maintenance plans, especially for simple tasks such as Backups.

Now, I’ll start out by saying straight up, sometimes they’re not the optimal solution. (You may want to backup certain partitions on a different a rotation schedule and the like as one example.)

Below is a recent situation I came across.

But, if you do decide to roll your own, please do NOT do the following:

  1. First write a sproc that has to be placed into each database as it’s created in order to run the backup. (If you DO decide to go this route, please make this a scripted part of your release procedure so it’s not missed.)
  2. If you DO in fact ignore suggestion #1, I will point out that it doesn’t do much good to make the first step of your sproc to check to see if the database exists. BIG HINT: If the database doesn’t exist, the sproc won’t run in the first place! Yes, I know you’re saying, “Obviously” but this is the situation I just came across.
  3. If you DO in fact ignore suggestion #2, you probably don’t need to call your own special “alert sproc”.
  4. But if you DO in fact ignore suggestion #3, make sure your “alert sproc” does more than call a “email sproc” and pass it a few parameters.
  5. And if you DO in fact ignore suggestion #4, please make sure your “email sproc” does a bit more than build a message and call the built-in SQL Server stored proc to send an email.

Now granted, there may be reasons to do some of the above. Perhaps your alert sproc also calls some sort of 3rd party monitoring tool. Or your email needs are very specific.

In the case I just rectified, none of that was true.  So there was a lot of additional complexity (which really didn’t work anyway) for no good reason.

Another problem this roll your own backup setup had was that it used the same filename every time for its backups. i.e. On Monday the backup name was M:\Backups\FOO_FULL.BAK.  On Tuesday it was M:\Backups\FOO_FULL.BAK, etc.

In theory (and generally in practice) each of these would be backed-up to a 3rd party so it was in theory possible to find the backup for a specific day, but that was an added complexity; and probably not one you want in an actual DR situation.  Also, if for some reason the backup to the the 3rd party failed (but the local backups continued) they’d definitely lose the ability to restore specific days of backups.

In addition, the person who built this procedure setup differential backups for MOST databases to run every 15 minutes.  Now, I wouldn’t necessarily call that a terrible idea, but in this case, almost certainly not the best approach in my opinion.  However, again, the same file name was used each time.

This means that in a DR event, the company could restore the previous nights backup and if they wanted, the most recent Diff backup and that was it.  If they wanted to restore to a point in time in between, that was impossible. And in my experience this is far more common than most other restore needs.

Finally, the developer who wrote all this clearly did not understand what the CHECKPOINT command did. He had scheduled a checkpoint job to run every 30 minutes.  Again, in general, not only not necessary, but probably a bad idea. However in this case it not only was not necessary, the reason given in the job comments was completely wrong.  He seemed to think it would keep the transaction logs from growing.

This is of course NOT what it does and sure enough on the one database still with FULL RECOVERY enabled the transaction log was far larger than the actual database. (Fortunately it was a lightly used database or the disk might have filled up years ago.)

Since discovering all this, I’ve gone and replaced all this complexity with a set of maintenance jobs.  These will guarantee each system database is backed up weekly (for their needs this should be fine) with unique names.  User databases will be backed up nightly and retained for 4 nights (and perhaps extended once we determine fully how much disk space we want to set aside for this.)  Transaction logs will be performed every 15 minutes. These too will have unique names.

Now the customer can restore to any point in time in the last 4 days (if they go to their 3rd party backup, even further back) up to 15 minutes before a failure (and in some cases if the log is still available and they can backup the tail of the log, up to the instant before the failure).

If they add additional databases, they don’t have to worry about remembering to put in 3 separate sprocs for each database added and adding new jobs to the SQL Server Agent.

Now they not only have a far more robust backup plan, they have one that is far easier to maintain.  Oh and one that will actually send an email if there’s a problem with the backup.

The morale is: Don’t make things more complex unless you absolutely need to and if you do, make sure you actually achieve the goals you’re trying to achieve.

Getting the right answer by suggesting the wrong one

I’m a participant on a CMC called Lily It is based out of my alma mater, RPI.  At some point, someone created a rule (which I’ve seen elsewhere so it’s hardly unique) that sometimes the fastest way to get the right answer to a question is to post the wrong answer.

There is truth to that.  I think in part it can be summed up with this XKCD cartoon.  Many of us who are involved in technology seem to have an incessant need to be “right”.  So when we see something wrong, we’re compelled to correct the mistake.

But, to be wrong, it has to be clearly wrong.  To go back to my cave rescue experience, if I recommend a 3:1 haul system and you recommend a 2:1, neither of us is necessarily wrong. We might be optimizing for different factors.  On the other hand, if you recommend we use 11mm rope for the haul line and I whip out some clothesline I’ve had in my car for a few years and suggest it should be good enough, after all it’s only Bill we’re rescuing, I’m clearly going to be wrong and need to be corrected.

These thoughts about being wrong and trying to find the right answer were prompted by a coding problem that has consumed far too much of my time. I finally came up with an answer that worked, but not one that I liked.

Essentially I’m building a Combobox (loading it from a datatable) in vb.net

It has key,value pairs, let’s call them (“Test1”, “A”), (“Test2”, “B”) and so forth.
(note VB.net appears to call these a DisplayMember,ValueMember pair and they can be loaded with a dictionary type, so in my mind it’s what they call the “valuemember” is what I’d consider the lookup key and that illustrate my misunderstanding of the issue.)

However, once I load the record in question, I want the selected value in the dropdown to reflect the value in the record (which of course is stored as “A” or “B” etc.)

There appears to be no way in VB.Net to easily say something like:

cbxResource.SelectedValue = Itemrecord.Value

Then I tried:

cbxResource.SelectedItem = Itemrecord.Item just to see if it would work. It doesn’t.

Googling suggests something like:

cbxResource.SelectedIndex = cbxResource.FindString(Itemrecord.Item)

That does indeed work, if I know the DisplayMember name. But that’s I want to display, not what I store in Itemrecord and as such means I don’t know it.

It strangely seems I can not set the index based on the ValueMember, just the DisplayMember.  To me this is strange since coming from a DB world, it appears the value member would be the key I’d want to look  up to select the Displaymember to be displayed.

I finally settled on a hack.  What if I switched the two?

cbxResources.DisplayMember = “Resource”
cbxResources.ValueMember = “Description”

cbxResources.SelectedIndex = cbxResources.FindStringExact(Itemrecord.Item)

cbxResources.DisplayMember = “Description”
cbxResources.ValueMember = “Resource”

I’m not sure I like this answer. It seems to me it should be far simpler. Or that I’m fundamentally misunderstanding how the control should be setup and used.  But for now it’s the hack that’s going into my code.

So why publish here?  Well either it’s a great work-around and I can save other folks the hours of fruitless searching I experienced, or someone can say, “It’s on the Internet and it’s wrong; I have to correct it!”

I’ll take either answer.

Moral: Sometimes being wrong is the right thing to do.

Shiny Things

It’s not uncommon for fads or “shiny things” to rise to the top.  In my work in cave rescue for example, there’s a lot of really cool equipment that comes out every year.  And often a particular piece or type of equipment will grab peoples attention and folks will start to use it all over the place. A good example of this are various “rope grabs” such as a Rescucender. Suddenly everyone is teaching, “Oh, you should use this tool, it’s so much better.” In part people fall into the trap of thinking it’s better because it’s newer.

Another example are ascending systems.  For climbing ropes in caves there are Frogs, Ropewalkers, Texas and more.  They all have their advantages. And quite honestly, for the most part, they are far better than what they replaced, the traditional “3-knot” system.  Unfortunately, as an instructor I’ve come across a large number of experienced vertical cavers who have NO idea how to climb on a 3-knot system.

Rescucenders have their place and they had their peak of prominence. For awhile it seemed they were used and taught all over the place. A lot of teams I know are going back to focusing on a good old Pruisk Knot for a rope grab.  This of course is the most common knot used in 3-knot climbing systems.

In one case, the newer, better, shinier thing was found not necessarily to be all that better.  In the other case, the newer equipment really is better. But the basics shouldn’t be forgotten. You never know when you’ll need to fashion Prusik’s out of shoelaces.

I bring this up because of an article brought to my attention today by , discussing SQL vs. NoSQL. Many have told me that NoSQL will replace a traditional SQL RDBMS. Of course I was told the same thing about OODBMS and other database systems. So far many of the replacements have withered on the vine and are long gone.  Some still thrive in niche applications.  I don’t think NoSQL will wither and die; nor is it simply a small niche.  Yet neither will it completely replace SQL databases either.  Both have incredible powers and are very good at what they do.  Both I think are here to stay for the foreseeable future.

But don’t fall into the trap of thinking you have a data problem and assuming the newer solution, NoSQL is the better solution.  It might be.  It might not. If I have to climb rope, I’ll use the “newer” (it’s decades old now) technology of my Frog climbing system. If I need to put a rope-grab on a rope, I’ll almost certainly go back to the “older” (it’s all relative) technology of a Prusik knot.

The actual needs of the application drive the solution. Don’t allow the solution to drive the design.

A good rule of thumb is: Anytime anyone sets up a debate, especially about technology as an X vs. Y debate, they’re probably framing it wrong from the get-go. It’s rarely X xor Y, but almost always X and/or Y, or even possibly Z.

All set to think

A short post today.

This past weekend I attended another SQL Saturday event. This one in Washington DC.  For anyone interested in SQL Server, I highly recommend these events. Typically the cost is just $10 to attend for the day.

I had put in a bid to present but unfortunately was declined.  That’s fine, there were plenty of other seminars worth my time to attend.

One of them, Common Coding Mistakes and how to Mitigate Them by William Wolf was a good example of that. @SQLWareWolf did a great job of illustrating a number of fairly common mistakes.  That itself was worth the price of admission. But what I really enjoyed was an observation of his, one that I’ve had in the past.  As a DBA, I can often spot stored procs written by a programmer with a non-SQL background.  The tell-tale is that many (certainly not all) programmers who come to SQL from another background often think in terms of rows, not sets.

An example of this would be the case of a programmer opening up a cursor and looping through the cursor to set the date on a number of records.  Of course for any person with a SQL background, we understand that can be a single statement which will execute far faster.

Now, there are certainly times when row by row is the only way to do it, but if you see that in a sproc or script, I’ll bet you 5:1 it was written by a programmer who didn’t know better.

Now, to be fair as a DBA who learned to program when “object oriented” wasn’t a buzz-word, I’ve been doing a lot of VB.net programming lately and I have to admit often I’ll find myself down a hole of twisty paths before I’ll realize I’m writing bad code and if I simply think of the data I’m trying to modify as an object, suddenly things get FAR more clear and easier.  I would not be surprised if 5 years from now the next programmer comes along, looks at my code and thinks, “What was he thinking? He obviously understood objects, but didn’t do everything he should have using objects.”

But in both cases, over time the non-SQL programmer coming to SQL will learn and I hope my OO code is slowly getting better.

No real moral here other than common observations and that hopefully overtime, we all improve.

Newspapers and paradigm shifts

When I was fairly young, I learned a detail about newspaper advertising.  The space on the lower-outside right-hand page was worth more than lower left inside page (i.e. along the fold).

If you think about how folks read and flip threw newspapers, this makes sense.  It’s an area more likely to be seen than others.

With news, there’s the term “above the fold” and “below the fold”  Obviously, you want the big news article on the front page, above the fold where it’s most likely to be seen.

When laying out a newspaper, there is over a century of experience in how to do things.  You don’t jump a front page news article to a page in the middle of the sports section; for the most part, you don’t run box-scores on the front page (unless perhaps it’s an upset at the Super Bowl or something else that will garner eyeballs); you don’t scatter sections of your newspaper across didn’t pages.

Years ago, I was proud to be part of one of the first newspaper web application service providers, “PowerAdz” (which later become PowerOne Media, and then later most of it was bought by TownNews.)

Even back then, I realized much of what was known about newspaper layout was going to have to change. There was no longer a physical fold in the newspaper.  There was a bottom edge to a browser window, and that still meant you needed the important news at the top.  But, how long should it run down the “page”.  How many pixels did the viewer have before the bottom edge of the window?  What was the width of your front page?

You also weren’t limited by a physical size to a page.  Articles could run on as long as readers were willing to scroll.  Or was having a reasonable sized page with links to following pages better?

Much of this is still in flux. And I suspect will continue to be for years to come.  Heck, just the fact that articles can have hyperlinks to other articles, or background information makes news on web pages very different from the traditional print medium.

What reminded me of this today was seeing yet another comment on a CNN fluff piece that was linked off of the front page.  The commentator was complaining that “this is news?”

Someone replied it was under the Entertainment section. Another rebutted “yeah, but it’s on the the front news page.”

That reminded me of these thoughts. What is the front page any more? Even though you can click to different sections of CNN, it’s not like a traditional newspaper where you have physically separate section, each with its own front page.  Now it’s all virtual and a front page is simply as you define it.

I think ultimately we have to let go of our definition of the front page of a news site and accept that links to news, fluff pieces and the like will all end up there.  Sure, there will be sections within the page, but to complain there’s sports, or entertainment, or other non-traditional news links off the front page will be like complaining you don’t have to unscroll the papyrus in the correct direction to read it: a sign of an older time.

Times change, but more importantly the medium changes, even if the message doesn’t.

 

QuiCR

I’ve been toying with an idea for a few months.  Ok.  I’ve been working on actively making it come to fruition.  Now I can announce the idea I’ve been working on: QuiCR.

With QuiCR, companies of all sizes will be able to get instant feedback and responses from their customers. There’s 84 million cell phone users out there, and via QuiCR, companies can leverage them and turn them into instant secret shoppers, or get their feedback, or have them report on maintenance issues that might otherwise go unnoticed.

Check it out out.  We’re still very early in the process, but I’m excited.

 

Think in Russian

In the classic Cold War thriller, Firefox, Vietnam veteran Mitchell Gant has to steal the top-secret Soviet Union, titular plane.  Among its advanced features is that the systems are controlled by thought.  But only if he thinks in Russian.  This becomes a key plot factor in the climax of the film.

In my last post ; I remarked how many bad solutions I had found to sorting a SQLDataconnection Gridview.  Well I’m happy to say I solved my problem last night. (And as an aside, I will not be posting the solution at this time because while the solution itself is decent, I’m not sure the code is the best.)  Part of my solution was solved by “thinking in Russian”.

As I had mentioned, some of the so-called solutions to this problem that I had found on-line were pretty bad.  In one case the author apparently decided the easiest solution was to decipher the viewstate, extract out the information, sort it and stuff it back into the viewstate.  Now, as I’m writing this, I realize one advantage this has is that it removes a roundtrip from the web server to the database server.  But that’s about it.

I also saw a solution that involved passing the column name back to the code-behind and having that decide which of multiple stored procs to call.  I can’t say I favored this approach since it basically means a lot more maintenance if you ever say want to add a column to your Gridview (which turns out I decided afterwards I may want to do.)

Even worse, I’ve seen people propose things like building the select string on the fly and I can’t even begin to say how bad of an idea that is.

That said, the more I thought about it, the more I realized what the “right” solution was.  Rather than fighting the system, I had to think like the system.

So, after a side trip down to trying to use SqlDataAdapter, I went back to my first approach of using a SqlDataReader.  However, based on one example I saw, I decided to move this to a function of its own.  This, if nothing else resulted in cleaner code (since I was already calling the SQLDataReader in two places (see Rule of Three).  Once I did this, it was a little matter of figuring out how to bind the SqlDataReader to a Datatable and returning that.

Then I could bind the datatable directly to the GridView if I wanted to (which I do on the original Page_Load and Click_Submit OR I  could in the _Sorting event bind it to a local Datatable, sort that and then bind the resulting sorted Datatable to my Gridview.

Worked like a charm.  Well except for one little detail.  And this one I’m still not sure if it is a MSFT bug that lives on for backwards compatibility or I and MANY other developers are doing something wrong, but the GirdView incorrectly will always return “descending” for its sort direction.

So in this case the common (still not convinced it’s the RIGHT or BEST solution) is to stuff a variable in the Viewstate and read that back every time sorting is called and reverse it as needed.

Once I did that, I had working code that could sort my Gridview on the selected column that was clean, easily reproducible and made sense.

I’ve found with my forays into .NET Framework and VB programming that if my initial approach appears overly complicated or just plan wrong, it probably is.  So far in pretty much all cases, I’ve found that if I stop and try to “think in Russian” the solution will appear to me and is generally fairly straightforward and looks right.

Years ago when I studied Latin, I reached the point where I could read Latin natively.  I loved it. But part of the switch is being able to think in the structure of the language.  Not all languages use “SVO” (Subject-Verb-Object) order like English.   Latin, “SOV” (Subject-Object-Verb) order uses.  It takes some getting used to. But once you accept it, things get easier.

So I can’t fly a Mach 6 stealth aircraft, nor do I speak Russian, but I’m starting to think in VB. (Or is that I’m in VB starting to think?)

A Bright Idea and State of the Art

I think everyone likes to talk about “their first program”.  I suspect though it’ll become a less common topic with future generations, just like most kids don’t recall the first book they ever read.

My first program calculated things in Celsius if you provided the Fahrenheit temperature.

I was probably 11 when I helped write it.  It was stored on paper tape and ran on the local high school’s minicomputer (probably a PDP-9 but I honestly have no idea).

It wasn’t a long program, it was probably in FORTRAN.  Again, that is so long ago, I can’t recall the details.  And it wasn’t a very impressive program.  Heck, these days you can do it in a Windows CMD script as one line. (well two for clarity, SET F=212, SET /A (%F%-32)/9*5)

I wrote more complex programs in High School (by then had moved up to Turbo Pascal) and made my first money programming in FORTRAN while in college.

Things had improved from paper tape to floppy drives to hard drives.  Writing programs and debugging programs for the most part became faster. But generally anything more complex than basic input and output through the screen and keyboard was still tough to do and time-consuming.

About two weeks ago I had an idea for project.  I was on the road at the time and didn’t get a chance to sit down at my desktop until last week.  In less than 24 hours I had prototyped the idea and tested it.  The program involved a website, a database, doing some lookups, writing to the database and a bit more.  Even just 10-15 years ago it could have easily taken me 4 or 5 times as long to do something like that.

On Thanksgiving Day, my son wrote a program in a language called Scratch http://scratch.mit.edu/ that would take input, make it circle around then settle on the screen.  The more times you entered text the bigger the resulting “wordle” spiral would grow.  He wrote it that morning before our relatives showed up.  It took him maybe an hour or two, including debugging and overcoming some initial limitations.

He’s been writing programs in Scratch (and other languages) for years now.  I doubt he remembers his first program since writing programs now has become about as easy as using a computer.  I’m sure he doesn’t remember his first time using a computer like I do.  He writes programs at age 11 that in many ways are more complex than anything I wrote in my teens.

The state of the art has certainly changed and it’s made the world a better place all-around.  Languages and frameworks make developing faster and easier than ever before.

Though at times I’ll admit I miss the days of FORTRAN.

Order of the day

I’m in the middle of designing a db schema for a project I’m working on.

As I was sitting in my car, waiting for the light to change, I remembered a mantra of SQL design.  And what is funny is I often see this fundamental aspect overlooked or even in at least one case, intentionally ignored: namely tables do not have any intrinsic order.

Almost all of my experience has been using MSFT SQL Server, so my thoughts will be based on that, but the general idea is true.  Beginners to SQL will assume “if I put the data in in order, it will come out in order.” Now if they’re a bit more than a beginner, they’ll even test that.  And lo and behold, they’ll probably find their assumption is right when they test it.  Then they’ll put the code and schema into production and find that suddenly things aren’t working right.

They’ll wonder why it worked on their machine, but not on production.  Generally there’s two reasons.  SQL Server is very aggressive about caching, so there’s a chance it’ll read the data out of memory in the order they put it in.  In addition, if it does have to read it off a disk, it’ll most likely do it in a single-threaded manner, following the primary key and the data will come out as expected.

On production though, between the time the data is inserted and later read back, the cache may look very different.  But even more so, a production machine is liable to have multiple disks and multiple CPUs which means multiple read threads will occur and SQL Server will then put the data together in the faster way possible.  Suddenly it’s no longer in the order the programmer wanted it or expected it.

Now, if you’re writing a quick ad-hoc query to get some data out quickly, that’s probably ok.  I’ll admit I do a lot of queries without an order by when I just need to quickly get some data.  But if it becomes time to productize the code, I’ll use an order by.

As I mentioned at the top of this post, the lack of an Order By seems to be a fairly common mistake made by folks new to SQL programming.

But what about the case when it’s intentional?  Years ago we were rolling out some new code and in the code there was a query that did a query against a table.  In theory the table would only ever have one row.  The programmer decided with one row no order by was necessary.  However, as always theory and practice don’t always match and I asked what would happen if there was more than one row?  His answer was to use a “TOP (1)” in the query.  So I asked him what would guarantee he’d get the top row he wanted.  He said he didn’t think the problem would ever arise and resisted using the ORDER BY “for performance” reasons he said.

Well since I wasn’t his manager, I wasn’t about to fight this particular fight.  But I did make a note of it.

Sure enough, about two years later (a year after he had left the company) the page that used this query started to return the wrong results.  A quick look and a quick addition of an ORDER BY and all was well.

It’s always the little things.  And that’s the order of the day.