I wrote last year about how my job title seems to be DBA and other Duties as assigned. A little incident yesterday got me thinking about that again. This time though, it got me thinking more about the DBA side of that title.
I’ve mentioned before I’ve written a number of ETLs for various clients. Some are fairly complex and some are fairly simply PowerShell scripts. In most cases, at the end of the script I send an email with a success or failure status and some additional information if appropriate.
Over the last two weeks I had noticed that the email for one particular ETL was coming in later and later. I finally found somet time to investigate. I looked at the destination table and noticed that some of the tables had extremely large numbers of rows. Now, my first inclination was that the source data had increased (which would be a good thing, it meant the client was basically selling more widgets). But the increase seemed too dramatic and large. My next thought was perhaps the export itself was simply giving me more data.
So I decided to look more closely at the data and I noticed something interesting. Picking on of the tables at random, I simply did a
select * from RandomTable order by GoodIndex
SSMS returned approximately 24,000 rows. But something stood out. There appeared to be duplicates rows. Lots of them. This raised a lot of suspicions.
I then ran the query on my UAT box which in this case actually loads a copy of the production data. I got back about 1,600 rows. VERY curious.
Back on production now I ran
select distinct * from RandomTable order by GoodIndex
This time I got back the same number as UAT. Extremely curious.
Then it dawned on me, about 2 weeks ago, we had made a change to where the script looked for the source data, a number of CSV files. UAT had not changed, but Prod had. The reason for this change was to be able to get the ETL PowerShell script to run on the proper production server (the original location had security issues reading from the original CSV file location.)
So my first thought was that the team that had updated their export had somehow left out the command to delete the old file and was simply appending. I was about to write a pointed email when I stopped myself.
I looked at my code and realized that when I had setup the script on the new production box, I had properly handled all the permissions except one: the step that truncated the destination table. Sure enough, my code was no longer truncating the table before I inserted new information. Fortunately I realized my mistake before sending that pointed email.
So was this a DBA issue or a “other duties as assigned” issue? I don’t know, but I’d say as a DBA I should have not created the problem in the first place, but thankfully, due to proper logging and emails I was able to catch it and solve it fairly quickly.
Moral of the story: Pay attention to details. Notice when things start to drift. They can be the sign of a larger issue.