Goldstone has the Bird

One of my favorite web-comics is xkcd.  I was recently reminded of  I’m in the process of putting together a .NET 4.0 Framework based website using VB.  I have a very nice table thank you very much that is bound to a SQL Datasource in the code-behind.  Nice and slick.

Except, I want to make it sortable.  Should be as simple as setting AllowSorting=”True” and marking the columns I want sortable.  However, that doesn’t work when bound to a SQL Datasource.

No problem.  I’ll just Google it.  And there I find about a dozen answers.  And a third of them are just plain wrong (wrong enough even with my limited VB experience I can tell they are wrong), another third quote the wrong answers and the final third simply don’t work (and are so poorly commented in some cases I can’t even figure out what the original author was even attempting.)

And please, before you forward me a bunch of links in an effort to be helpful, don’t.  I’ve seen them.  Probably.

But it does lead to the general question.  How do you know what you read on the Internet is accurate? Even if it quotes a source, how do you know the source is accurate?

For years many people who followed the space program believed that “Goldstone has the bird” was a quote about Explorer 1, the first US satellite being picked up by the Goldstone tracking station, thus proving it had made orbit.  For years this was believed by many.  It is quoted in several reputable places, including the autobiography of someone who should have had first hand knowledge.

However, there is at least one little problem.  Goldstone hadn’t been built at the time.  So even the “original” source appears to have gotten it wrong. for more details.

Just give that a thought the next time you look something up on Wikipedia or another source to confirm your facts.

Order of the day

I’m in the middle of designing a db schema for a project I’m working on.

As I was sitting in my car, waiting for the light to change, I remembered a mantra of SQL design.  And what is funny is I often see this fundamental aspect overlooked or even in at least one case, intentionally ignored: namely tables do not have any intrinsic order.

Almost all of my experience has been using MSFT SQL Server, so my thoughts will be based on that, but the general idea is true.  Beginners to SQL will assume “if I put the data in in order, it will come out in order.” Now if they’re a bit more than a beginner, they’ll even test that.  And lo and behold, they’ll probably find their assumption is right when they test it.  Then they’ll put the code and schema into production and find that suddenly things aren’t working right.

They’ll wonder why it worked on their machine, but not on production.  Generally there’s two reasons.  SQL Server is very aggressive about caching, so there’s a chance it’ll read the data out of memory in the order they put it in.  In addition, if it does have to read it off a disk, it’ll most likely do it in a single-threaded manner, following the primary key and the data will come out as expected.

On production though, between the time the data is inserted and later read back, the cache may look very different.  But even more so, a production machine is liable to have multiple disks and multiple CPUs which means multiple read threads will occur and SQL Server will then put the data together in the faster way possible.  Suddenly it’s no longer in the order the programmer wanted it or expected it.

Now, if you’re writing a quick ad-hoc query to get some data out quickly, that’s probably ok.  I’ll admit I do a lot of queries without an order by when I just need to quickly get some data.  But if it becomes time to productize the code, I’ll use an order by.

As I mentioned at the top of this post, the lack of an Order By seems to be a fairly common mistake made by folks new to SQL programming.

But what about the case when it’s intentional?  Years ago we were rolling out some new code and in the code there was a query that did a query against a table.  In theory the table would only ever have one row.  The programmer decided with one row no order by was necessary.  However, as always theory and practice don’t always match and I asked what would happen if there was more than one row?  His answer was to use a “TOP (1)” in the query.  So I asked him what would guarantee he’d get the top row he wanted.  He said he didn’t think the problem would ever arise and resisted using the ORDER BY “for performance” reasons he said.

Well since I wasn’t his manager, I wasn’t about to fight this particular fight.  But I did make a note of it.

Sure enough, about two years later (a year after he had left the company) the page that used this query started to return the wrong results.  A quick look and a quick addition of an ORDER BY and all was well.

It’s always the little things.  And that’s the order of the day.