Thanks to Deb Melkin for hosting this month’s T-SQL Tuesday and developing the topic. Instead of calling this a rant, perhaps I should call it a “I told you so.” There’s a common refrain among DBA of “It depends” when asked a question. And that’s generally accurate. But this is the case of me saying “it doesn’t depend, do as I say” and being ignored.
Ironically when I took my Database class in college, it was long enough now that the idea of a “Sequel Database” (that along should tell you how long ago this was) was basically described as an up and coming technology that perhaps had a future. Talk about a bold and accurate prediction! That said, one of the things then that fascinated me, and still does, is that SQL (in general, not the product SQL Server) is based on work done by Edgar F. Codd and has a fairly strict mathematical basis. (Which is another reason I rail against those who claim that RDBMS and SQL will eventually be replaced. That’s like saying Algebra will be replaced. There may be other branches of mathematics developed that are far better for their specific domains, but the validity and usability of Algebra will never go away.).
In any event, one of the key concepts that Codd developed was that of “a table”. A table has several particular parts to its overall definition. The one critical for this blog is that a table itself has no implicit order. Now, many folks will do a query multiple times and always get the same results every time. But that’s more a factor of how SQL Server happens to handle reads. At my last full-time job, I was easily able to prove to the programmers that a query on the UAT box would result in a different order than on Prod because of the number of CPUs and disks. But that’s not what I’m here to talk about.
My “I told you so moment” goes back further to a table that was about as simple as you can get. It had a single row. Now, I think we can all agree that a single row will always return the same order, right? I can’t recall exactly why the developer felt that this table was the solution to his problems, but I pushed back. I asked at the very least he put in a where clause. He felt that would impact performance too much and besides, with one row, it would always return his results. I of course asked, “What happens if eventually the table has two rows?” “Oh, well my row will return first anyway.” “No it won’t.” Well he wouldn’t budge and I had bigger fish to fry. At the time there really was no reason to expect this table to grow. But I tucked it away in the back of my mind.
Sure enough, about a year later, which was 3 months after the developer left, we started to get really weird results on the webpage that was relying on that table. It seems that another developer realized this table was a perfect place for him to store the data that he needed (I’m assuming it was some sort of config data, but it was honestly so long ago I can’t recall) so he added a row. Now HE was smart enough to add a where clause to his query. But the original “Don’t worry about it query” still had no where clause. And sure enough, sometimes was returning the new row instead of the original. Fortunately this was a 5 minute fix. But I can only imagine how long it would have taken to find the problem if I hadn’t remember it in the first place.
So, while as DBA I will often say “it depends”, I will always be adamant in saying that tables are unordered by definition and you absolutely need a where clause or an order by if you want to guarantee specific results. Though, I suppose it depends, if you don’t care about the order or need a specific set of data you can ignore my rant. There are cases where that’s valid too.
Thus ends my TED talk.