Order of the day

I’m in the middle of designing a db schema for a project I’m working on.

As I was sitting in my car, waiting for the light to change, I remembered a mantra of SQL design.  And what is funny is I often see this fundamental aspect overlooked or even in at least one case, intentionally ignored: namely tables do not have any intrinsic order.

Almost all of my experience has been using MSFT SQL Server, so my thoughts will be based on that, but the general idea is true.  Beginners to SQL will assume “if I put the data in in order, it will come out in order.” Now if they’re a bit more than a beginner, they’ll even test that.  And lo and behold, they’ll probably find their assumption is right when they test it.  Then they’ll put the code and schema into production and find that suddenly things aren’t working right.

They’ll wonder why it worked on their machine, but not on production.  Generally there’s two reasons.  SQL Server is very aggressive about caching, so there’s a chance it’ll read the data out of memory in the order they put it in.  In addition, if it does have to read it off a disk, it’ll most likely do it in a single-threaded manner, following the primary key and the data will come out as expected.

On production though, between the time the data is inserted and later read back, the cache may look very different.  But even more so, a production machine is liable to have multiple disks and multiple CPUs which means multiple read threads will occur and SQL Server will then put the data together in the faster way possible.  Suddenly it’s no longer in the order the programmer wanted it or expected it.

Now, if you’re writing a quick ad-hoc query to get some data out quickly, that’s probably ok.  I’ll admit I do a lot of queries without an order by when I just need to quickly get some data.  But if it becomes time to productize the code, I’ll use an order by.

As I mentioned at the top of this post, the lack of an Order By seems to be a fairly common mistake made by folks new to SQL programming.

But what about the case when it’s intentional?  Years ago we were rolling out some new code and in the code there was a query that did a query against a table.  In theory the table would only ever have one row.  The programmer decided with one row no order by was necessary.  However, as always theory and practice don’t always match and I asked what would happen if there was more than one row?  His answer was to use a “TOP (1)” in the query.  So I asked him what would guarantee he’d get the top row he wanted.  He said he didn’t think the problem would ever arise and resisted using the ORDER BY “for performance” reasons he said.

Well since I wasn’t his manager, I wasn’t about to fight this particular fight.  But I did make a note of it.

Sure enough, about two years later (a year after he had left the company) the page that used this query started to return the wrong results.  A quick look and a quick addition of an ORDER BY and all was well.

It’s always the little things.  And that’s the order of the day.

Simplicity

Over the years I’ve been involved in a number of web-based companies.  All had great ideas for their business model.   One had one of them had a great idea for classified ads.  It had the latest in taxonomic matching and advanced search capabilities.  If you were looking for a Mustang, it could tell direct you to ads for cars or horses depending on context and other factors.  Its search capabilities were ahead of the time.  It had pretty much every bell and whistle the newspapers asked for and that the design folks could think of.

Then Craigslist came alone.  Craigslist was free (at least compared to newspaper classified ad sites where the newspapers typically charged.)  It had no taxonomic matching.  Its search capabilities were and still are bare-bones.  In fact, it very much relies on the user to narrow down and define searches.

But it succeeded where the other product failed for what I believe one very simple reason.  It was simply blazingly fast.  It didn’t matter if it returned bad results the first time.  It was so fast the user didn’t mind typing in new search parameters and narrowing down their search.  It was faster than any of the “advanced” newspaper classified engines I saw.  Sure, they might try to do a better job of returning results, but the honest truth was, in most cases people would end up doing multiple searches anyway trying to narrow down their search.  And in the time it took to do 2-3 searches with a typical website, Craigslist allowed the user to do 10-15 searches.  Time was money and people wanted to do things quickly.

Over the years with numerous sites I’ve seen the design get in the way of the end-user.  The truth is, 80% of the time, people will use 20% of the features, but they want those 20% to be as fast as possible.

So, keep it simple and keep it very fast.

One of these days though I’ll relate the story of the 3,000 mile Steinway search.