I realized after writing my earlier post that today was T-SQL Tuesday. I wasn’t going to contribute, but after seeing some posts, I thought I’d give a very quick shot at it. This month, Tom Zika (t | b) asks us to talk about what makes code “production grade”. You can find his full invitation here.
There’s some great columns there, but I’ve noticed something that many developers assume (and honestly, it’s a good thing) and that’s that they work in a company with good source control and a decent release procedure. Sadly, with my clients, it’s rarely the case. Often I’m inheriting code that’s only available on the production server itself, or there’s 20 different contributors (ok I’m exaggerating, but not by much) and each has their own stash of code.
Ultimately this means the production server really the only single source of truth. So that leads me to my first item.
Select * and other shortcuts
It should be obvious, but while I may often use Select * while developing code, I’d never put it into production. Even if it works, it’s messy. But I’d go a step further. I prefer to fully qualify all my columns. For example
select Emp_Num, First_Name, Last_Name, City from Employee_Table
vs
select E.Emp_Num, E.First_Name, E.Last_Name, E.City from Employee_Table E
Now the above is an extremely artificial example. But now imagine I want to join it to say a table of phone numbers (because the original developer was smart enough to realize an employee could have multiple phone numbers and didn’t simply add columns to the Employee_Table.)
So now someone comes along and rewrites the first as:
select Emp_Num, First_Name, Last_Name, City, Phone_Num from Employee_Table E
inner join Employee_Phones EP on EP.Emp_Num = E.Emp_Num
Now, they’re of course deploying to production as they go and suddenly the above code breaks. Fortunately, they’re fairly smart and realize the fix and go in and edit it to
select E.Emp_Num, E.First_Name, E.Last_Name, E.City, EP.Phone_Num
from Employee_Table E
inner join Employee_Phones EP on EP.Emp_Num = E.Emp_Num
So it’s a simple thing, but by making a habit of fully qualifying your column names, you can avoid future errors.
Failing Gracefully
When I’m writing quick and dirty code, while I try to avoid errors of course, I’m not overly worried about leaving the system in unstable state. By this I mean, if I’m debugging code with a cursor in it and it breaks and I have to manually drop the cursor that’s fine. Same thing with transactions. Yeah, I might block someone else’s work, but I’ll pretty quickly realize what I’ve done and be able to commit or rollback my transaction.
In production code, without going into details on TRY/FAIL blocks and all that, I would argue that any code that contains a cursor, a transaction or anything else that could potentially block processing absolutely needs to have robust error handling. I’ll ignore the debate about what the best way to handle it is, in part because sometimes rolling back is the right answer, trying again might be the right answer, or even finishing the transaction and then cleaning up data later. The point is, you can’t afford to fail in an ungraceful way and leave your system in an unknown state.
Alerting
I didn’t have this on my mind when I started out with this post, but the last bit reminded me of it. It’s not code per se, but more jobs and the like. Generally, I’m a huge fan of alerts. If something has failed, I want an alert! But, I realized a long time ago, that alerts have to be actionable. This means the person receiving it has to both be able to act on it and that it actually needs to be acted upon. If something fails and it needs no action (and the action can be as simple as simply noting it for future reference) then don’t bother alerting. Log it or at the very least, retry before you send an alert. Years ago at one client they had a job that would fail once about every 100 days. It ran once in the morning. It had an alert that met the above criteria, I or another DBA could react to it and in this case the reaction was simply “retry the job”. I finally analyzed it and realized that given the failure mode, simply waiting a minute and retrying was a far better solution than alerting us. I did some math on the failure mode and realized that this new setup should cause failure on the second attempt (and then send us an alert) once every 10,000 days. So the initial alert was sort of pointless when there was a better way of handling it.
Conclusion
So, to sum things up: avoid errors, if you do have errors, handle them gracefully, and if you have to alert, ma