Brake (sic) it to Fix it!

Last week I wrote about getting my brakes fixed. Turns out I made the right choice, besides shoes and pads, I needed new calipers.  Replacing them was a bigger job than I would have wanted to deal with. Of course it cost a bit more than I preferred, but I figure being able to make my car stop is a good thing. I had actually suspected I’d need new calipers because one of the signs of my brakes needing to be replaced was the terrible grinding and shrieking sound my left front brake was making. That’s never a good sign.

As a result I started to try to brake less if I could help it. And I cringed a bit every time I did brake. It became very much a Pavlovian response.

About two weeks ago, a colleague of mine said, “Hey, did you notice the number of errors for that process went way up?” I had to admit, nope, I had not. I had stopped looking at the emails in detail. They were for an ETL process that I had written well over a year ago. About 6 months ago, due to new, and bad, data being put into the source system, the ETL started to have about a half dozen rows it couldn’t process.  As designed, it sent out an email to the critical parties and I received copies.  We talked about the errors and decided that they weren’t worth tracking down at the time.  I objected because I figure if you have an error, you really should fix it. But I got outvoted and figured it wasn’t my concern at that point. As a result, we simply accepted that we’d get an email every morning with a list of rows in error.

But, as my colleague pointed out, about 3 months ago, the number of errors had gone up. This time it wasn’t about a half dozen, it was close to 300. And no one had noticed.  We had become so used to the error emails, our Pavlovian response was to ignore them.

But, this number was too large to ignore. I ended up doing two things. The first, and one I could deploy without jumping through hoops was to update the error email. Instead of simply showing the rows of errors, it now included a query that placed a table at the top that showed how many errors and in which tables. This was much more effective because now a single glance can easily show if the number of errors has increased or gone down (if we get no email that means we’ve eliminated all the errors, the ultimate goal in my mind.)

I was able to track down the bulk of the 300 new errors to a data dictionary disagreement (everyone raise their hands who has had a customer tell you one thing about data only to discover that really the details are different) that popped up when a large amount of new data was added to the source system.

I’ve since deployed that change to the DEV environment and now that we’re out of the end of the month code freeze for this particular product, will be deploying to production this week.

Hopefully though the parties that really care about the data will then start paying attention to the new email and squawking when they see a change in the number of bad rows.

In the meantime, it’s going to take me awhile to stop cringing every time I press my brakes. They no longer make any bad sounds and I like that, but I’m not used to to absence of grinding noises again.  Yet. In both cases, I and my client had accepted the normalization of deviance and internalized it.

I wrote most of this post in my head last week while remembering some other past events that are in part related to the same concept.  As a result this post is dedicated to the 17 American Astronauts who have perished directly in the service of the space program (not to diminish the loss of the others who died in other ways).

  • Apollo 1 – January 27th, 1967
  • Challenger – STS-51L – January 28th, 1986
  • Columbia – STS-107 – Launch January 16th, 2003, break-up, February 1st, 2003.

 

 

The Soyuz Abort

Many of you are probably aware of the Soyuz abort last week. It reminded me of discussions I’ve had in the past with other space fans like myself and prompted some thoughts.

Let’s start with the question of whether Soyuz is safe. Yes but…

When Columbia was lost on re-entry a lot of folks came out of the woodwork to proclaim that Soyuz was obviously so much safer since no crew had died the ill-fated Soyuz 11 flight in 1971. The problem with this line of thought was that at the time of Columbia, Soyuz had only flown 77 times successfully vs 89 successful flights since the Challenger Disaster. So which one was safer? If you’re going strictly on the successful number of flights, the Space Shuttle. Of course the question isn’t as simple as that. Note I haven’t even mentioned Soyuz 1, which happened before Soyuz 11 and was also a fatal flight.

Some people tried to argue that the space shuttle was far less safe because during the program it had killed 14 people during its program life vs 4 for Soyuz.  I always thought this was a weird metric since it all came down to the number of people on board. Had Columbia and Challenger only flown with 2 on each mission, would the same folks argue they were equally safe as Soyuz?

But we can’t stop there. If we want to be fair, we have to include Soyuz-18a. This flight was aborted at a high altitude (so technically they passed the Karman Line and are credited with attaining space.)  Then in 1983, Soyuz T-10a also suffered an abort, this time on the pad.

So at this point I’m going to draw a somewhat arbitrary line as to what I consider a successful mission: the crew obtains an orbit sufficient to carry out a majority of their planned mission and returns safely. All the incidents above, Soyuz and Space Shuttle are failed missions.  For example, while Soyuz-11 and Columbia attained orbit and carried out their primary missions, they failed on the key requirement to return their crew safe.

Using that definition, the shuttle was far more successful. There was one shuttle flight that did undershoot the runway at Edwards, but given the size of the lakebed, landed successfully.  We’ll come back to that in a few.

Now let me add a few more issues with the Soyuz.

  • Soyuz-5 – failure of service module to separate, capsule entered upside-down, and the hatch nearly burned through. The parachute lines also tangled resulting in a very hard landing.
  • TMA-1 – technical difficulties resulted in the capsule going into a ballistic re-entry mode.
  • TMA-10 – Failure of the Service Module to separate caused the capsule to re-enter in an improper orientation (which could have lead to a loss of the crew and vehicle) which ended up causing the capsule also re-enter in a ballistic re-entry mode. The Russians initially did not tell the US.
  • TMA-11 – Similar issue as TMA-10, with damage to the hatch and antenna that was abnormal.

And there have been others of varying degree. I’m also ignoring the slew of Progress failures, including the 3 more recent ones that were launched on a rocket very similar to the current Soyuz-FG.

So, what’s safer, the Soyuz or the Space Shuttle?  Honestly, I think it’s a bit of a trick question. As one of my old comrades on the Usenet Sci.space.* hierarchy once said, “any time a single failure can make a significant change in the statistics, means you really don’t have enough data.” (I’m paraphrasing).

My personal bias is, both programs had programmatic issues (and I think the Russians are getting a bit sloppier when it comes to safety) and design issues (even a perfectly run shuttle program had risks that could not have been solved, even if they might have prevented both Challenger and Columbia).  However, I think the Russian Soyuz is ultimately more robust. It appears a bit more prone to failures, but it has survived most of them. But, that still doesn’t make it 100% safe. Nor does it need to be 100% safe.  To open the new frontier we need to take some risks.  It’s a matter of degree.

“A ship in harbor is safe, but that is not what ships are built for.” – John A. Shedd.

A spacecraft is safe on the ground, but that’s not what it’s built for.

In the meantime, there’s a lot of, in my opinion naive, talk about decrewing ISS. I suspect the Russians will fly the Soyuz TM-11 flight as scheduled. There’s a slight chance it might fly uncrewed and simply serve to replace the current Soyuz TM-9 capsule, but it will fly.