Brake (sic) it to Fix it!

Last week I wrote about getting my brakes fixed. Turns out I made the right choice, besides shoes and pads, I needed new calipers. Replacing them was a bigger job than I would have wanted to deal with. Of course it cost a bit more than I preferred, but I figure being able to make my car stop is a good thing. I had actually suspected I’d need new calipers because one of the signs of my brakes needing to be replaced was the terrible grinding and shrieking sound my left front brake was making. That’s never a good sign.

As a result I started to try to brake less if I could help it. And I cringed a bit every time I did brake. It became very much a Pavlovian response.

About two weeks ago, a colleague of mine said, “Hey, did you notice the number of errors for that process went way up?” I had to admit, nope, I had not. I had stopped looking at the emails in detail. They were for an ETL process that I had written well over a year ago. About 6 months ago, due to new, and bad, data being put into the source system, the ETL started to have about a half dozen rows it couldn’t process. As designed, it sent out an email to the critical parties and I received copies. We talked about the errors and decided that they weren’t worth tracking down at the time. I objected because I figure if you have an error, you really should fix it. But I got outvoted and figured it wasn’t my concern at that point. As a result, we simply accepted that we’d get an email every morning with a list of rows in error.

But, as my colleague pointed out, about 3 months ago, the number of errors had gone up. This time it wasn’t about a half dozen, it was close to 300. And no one had noticed. We had become so used to the error emails, our Pavlovian response was to ignore them.

But, this number was too large to ignore. I ended up doing two things. The first, and one I could deploy without jumping through hoops was to update the error email. Instead of simply showing the rows of errors, it now included a query that placed a table at the top that showed how many errors and in which tables. This was much more effective because now a single glance can easily show if the number of errors has increased or gone down (if we get no email that means we’ve eliminated all the errors, the ultimate goal in my mind.)

I was able to track down the bulk of the 300 new errors to a data dictionary disagreement (everyone raise their hands who has had a customer tell you one thing about data only to discover that really the details are different) that popped up when a large amount of new data was added to the source system.

I’ve since deployed that change to the DEV environment and now that we’re out of the end of the month code freeze for this particular product, will be deploying to production this week.

Hopefully though the parties that really care about the data will then start paying attention to the new email and squawking when they see a change in the number of bad rows.

In the meantime, it’s going to take me awhile to stop cringing every time I press my brakes. They no longer make any bad sounds and I like that, but I’m not used to to absence of grinding noises again. Yet. In both cases, I and my client had accepted the normalization of deviance and internalized it.

I wrote most of this post in my head last week while remembering some other past events that are in part related to the same concept. As a result this post is dedicated to the 17 American Astronauts who have perished directly in the service of the space program (not to diminish the loss of the others who died in other ways).

Apollo 1 – January 27th, 1967
Challenger – STS-51L – January 28th, 1986
Columbia – STS-107 – Launch January 16th, 2003, break-up, February 1st, 2003.

greenmountainsoftware

Thoughts on SQL, Disasters and thinking

Brake (sic) it to Fix it!

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply