T-SQL Tuesday #152 – My Rant

Thanks to Deb Melkin for hosting this month’s T-SQL Tuesday and developing the topic. Instead of calling this a rant, perhaps I should call it a “I told you so.” There’s a common refrain among DBA of “It depends” when asked a question. And that’s generally accurate. But this is the case of me saying “it doesn’t depend, do as I say” and being ignored.

Ironically when I took my Database class in college, it was long enough now that the idea of a “Sequel Database” (that along should tell you how long ago this was) was basically described as an up and coming technology that perhaps had a future. Talk about a bold and accurate prediction! That said, one of the things then that fascinated me, and still does, is that SQL (in general, not the product SQL Server) is based on work done by Edgar F. Codd and has a fairly strict mathematical basis. (Which is another reason I rail against those who claim that RDBMS and SQL will eventually be replaced. That’s like saying Algebra will be replaced. There may be other branches of mathematics developed that are far better for their specific domains, but the validity and usability of Algebra will never go away.).

In any event, one of the key concepts that Codd developed was that of “a table”. A table has several particular parts to its overall definition. The one critical for this blog is that a table itself has no implicit order. Now, many folks will do a query multiple times and always get the same results every time. But that’s more a factor of how SQL Server happens to handle reads. At my last full-time job, I was easily able to prove to the programmers that a query on the UAT box would result in a different order than on Prod because of the number of CPUs and disks. But that’s not what I’m here to talk about.

My “I told you so moment” goes back further to a table that was about as simple as you can get. It had a single row. Now, I think we can all agree that a single row will always return the same order, right? I can’t recall exactly why the developer felt that this table was the solution to his problems, but I pushed back. I asked at the very least he put in a where clause. He felt that would impact performance too much and besides, with one row, it would always return his results. I of course asked, “What happens if eventually the table has two rows?” “Oh, well my row will return first anyway.” “No it won’t.” Well he wouldn’t budge and I had bigger fish to fry. At the time there really was no reason to expect this table to grow. But I tucked it away in the back of my mind.

Sure enough, about a year later, which was 3 months after the developer left, we started to get really weird results on the webpage that was relying on that table. It seems that another developer realized this table was a perfect place for him to store the data that he needed (I’m assuming it was some sort of config data, but it was honestly so long ago I can’t recall) so he added a row. Now HE was smart enough to add a where clause to his query. But the original “Don’t worry about it query” still had no where clause. And sure enough, sometimes was returning the new row instead of the original. Fortunately this was a 5 minute fix. But I can only imagine how long it would have taken to find the problem if I hadn’t remember it in the first place.

So, while as DBA I will often say “it depends”, I will always be adamant in saying that tables are unordered by definition and you absolutely need a where clause or an order by if you want to guarantee specific results. Though, I suppose it depends, if you don’t care about the order or need a specific set of data you can ignore my rant. There are cases where that’s valid too.

Thus ends my TED talk.

Woody Woodpecker

There’s an old saying in the software industry that “If houses were built like software was, the first woodpecker that came along would destroy civilization.”

I was reminded of that last week in both ways. First I was sitting at my desk when I heard what sounded like someone banging on a tin drum of some sort. I went up to the porch outside my office window and saw a bird fly off. Seeing that only solved part of the question in my mind and created a new one. The new one was what the heck was a woodpecker doing on the corner post. It’s pressure-treated and there’s no signs of any sort of infestation. But even more so, why was it so friggen’ loud and metal sounding? About 20 minutes later it was back. I chased it off again.

This cycle has become a daily event with it coming back 3-4 times a day. But I finally got a partial answer. I had forgotten that on the far side of the post was the copper downspout for the gutters. As far as I can tell, it’s hitting it directly (though I can’t imagine why) or close enough that the entire downspout is acting like a drum. Annoys the heck out of me, but the woodpecker doesn’t seem to care. And the good news is, despite all this, my house is still standing.

The other reminder was an error message I had received from an ETL process I have written for a customer. It takes data from a vendor and loads it into my client’s Salesforce instance. The error was one I hadn’t seen before (but had thankfully planned for). So I went into the package and started to investigate. It seemed all the new records had simply failed to load. I looked closely and could not see an issue, so I attempted a manual load and it instantly failed. This was unusual and troubling. My first thought was that the vendor had somehow changed the data format and that was causing a problem, but nothing really indicated that (and had that happened it most likely would have actually generated an error earlier in the process).

After some digging I realized that it wasn’t the vendors fault. As part of the ETL process, one of the steps that happens is a lookup into Salesforce for a Vendor ID to make sure the data coming in is tagged with the right ID. This step is somewhat important because the ID, being an internally generated one in Salesforce, is different between the UAT instance and the Prod instance, so rather than hardcoding it, it’s a lookup. And while the ID itself in an instance can never change, the keyword associated with it in theory can. In practice it shouldn’t. Normal users don’t have permission and admin users should know that changing the keyword should never be done, or if it is, needs an entire change process.

Sure enough, my lookup, which was basically doing “Where Vendor_ID=’Adventureworks'” was now returning nothing. So the insert, which needs that Vendor_ID was failing. A little digging showed that an admin had changed it and not to something like ‘Adventure Works’ or something that might have been close and thought to be a trivial example, but had changed it to something like ‘Bob and Mary’. It wasn’t even close.

Needless to say, better controls have been put into place, but apparently a woodpecker came along and broke my ETL.

Fortunately, that one isn’t coming back.

Code strong, code robustly, and have good error checking.

Projects You’re Proud Of?

When I present, I start my presentations with a brief bio of myself and one item on there I generally have is a comment that I like to solve problems. This may sound obvious, but it’s true and I think describes my goal well. Yesterday, while on a 3 hour zoom call with a client, we got talking about various projects and it made me think about some of the problems I’ve solved over the years.

There are several I could talk about, but one came up yesterday. Several years ago at a previous client, the head of their call center came to me with an issue. They had a process where they’d export their call center logs and input them into SQL Server. Except to call it a process was a bit of an overstatement. It was a series of about 4-5 steps, all except the initial export were done manually. This meant that every morning one of their IT people would take a file from a Linux server, copy it locally, and then import it into Access where several macros were run on it to transform it and then the person would import it into the SQL Server where they could then run reports. There were several possible areas for mistakes to happen and while mistakes weren’t routine, they tended to happen about once or twice a month. On a good day, it would take about 1/2 an hour to do the manual import, on a bad day, over an hour. So in a month, one of their IT people could easily spend 15 or more hours on it, or over 180 hours a year.

In addition, adding new meta-data into the process was error-prone and he couldn’t do as often as he liked. He asked if I could take a look at it and automate it. While SSIS is not an area of expertise, I was familiar enough with it to know it was a good fit and said I’d work on a solution. It took some effort, but eventually I had a solution in place. The entire process now runs automatically in about 5 minutes and he can add or remove the meta-data he needs to by updating a single SQL table. He was quite pleased.

I’m also proud to say the only real time there’s been an issue with the process is when they had to for business reasons IP their entire internal network. They unfortunately scheduled this for a week when I was not only on vacation, but spending that week at some National Parks and Forests in the South Dakota area. The remoteness of these locations meant that my connectivity was very limited. I let their IP team know what changes had to be made to a config file to make things work, but in the aftermath of other issues they had to deal with this was missed. Fortunately, once I found the right place to sit in the National Forest we were camped in and get enough of a cell signal to log into their network, I was able to make the update and fix things. Since then, things have worked without a hitch.

I like this particular project, not just because it’s been so problem free, but because I think I can clearly point to a problem a client had and that I helped solve. Now that IT person can spend their time on more important issues.

It also is an example of a mantra I think is generally true:

Anything that can be automated should be automated.

There’s other projects I may write about at other times (including a few involving PowerShell) but that’s it for today.

What projects are YOU proud of? I’d love to hear from you.

Trust but Verify

This is one of those posts where you’ll just have to trust me. Honestly.

I want to talk about indexes.

About a week ago, a friend on a chat system I use mentioned how one of their colleagues had mentioned, “oh, we don’t have to optimize the database, the server is fast enough” or words to those effect. All of us in the discussion blanched a bit. Yes, when I started in the business a 10GB database was considered large and because of the memory limit with 32-bit SQL, we were limited to 2GB (or 3GB if you took the right steps) of memory so it was literally impossible to keep a large database in memory. Of course now we routinely deal with databases 100s of GB in size with machines that can easily have .5TB of memory or more. This means except for writes, an entire database can easily be kept in memory.

But that said, optimization still matters. Last week I was debugging an ETL process that I’ve helped a client with. I’d love to show screen shots, but my NDA won’t allow me (hence my asking you to trust me). Ok, that’s partly a lie. I couldn’t provide too many details if I wanted to, but the bigger issue is, I’ve since closed the windows I that showed the scripts in questions and the results of my changes.

One of the last things each step in the ETL does is write back to the source table an updated Sales Force id. It’s actually a bit more complicated because what it really does is write to either a Success table or an Error table and depending on a factor or two, a trigger will then update the source table. I had previously debugged and improved the performance of the trigger. But something was still bothering me about the performance. I looked a bit deeper and one of the things that trigger does if there’s a success is make sure to remove the row from the Error table. This was taking longer than I suspected it should, so I dug into it and I noticed that the Error table had no index.  

I can’t show the original queries I used, but I can show an example of the impact of adding a simple clustered index. (See, you can’t even trust me to say I won’t show any examples! You’d better read the entire post to verify what I’m really writing!)

Here’s an example query (with some changes to hide client specific data)

select * from ErrorTable where SF__External_Id__c='005A000022IouWqIAX'

It’s a very simple query (and simpler than the actual one I was dealing with) but is enough to show the value of a proper index.

Now, in my original query, the Query Tuning Advisor actually suggested an index on SF__External_ID__c. In the example above it didn’t. There’s a canard among many DBAs that the QTA is generally useless and often it is, though I think it’s gotten better. As a consultant, I can often come into a new client and can tell when someone has gone crazy with the QTA and adopted EVERY SINGLE suggestion. In other words, they trusted it, but they never verified it. Why is this a problem? Well at times the QTA can be overly aggressive in my experience, suggesting indices that really provide little benefit, or if you add an index in response to a select query that is run say once a day, but where there are 1000s of updates a day, you might actually slow down your updates (since now the update also has to update the index). And as mentioned above, sometimes it might fail to suggest an index. (I think in this case, it didn’t suggest one on my example because the size of the underlying table was far smaller than before).

So, I like to verify that the index I’ll add will make a difference. In cases like this, I often go old school and simply bracket my test queries

set statistics IO ON
set statistics Time ON
select * from ErrorTable where SF__External_Id__c='005A000022IouWqIAX'
set statistics IO OFF
set statistics Time OFF

And then I enable Actual Execution Plan.

The results I received without any sort of index are below. Some key numbers are highlighted in red.

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 47 ms, elapsed time = 63 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(2 rows affected)
Table 'ErrorTable'. Scan count 1, logical reads 3570, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 15 ms.
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

You’ll notice the physical reads are 0. This is nice. This means everything is in memory.

In this case, because I’m familiar with how the ErrorTable is accessed I decided a clustered index on SF__External_Id__c would be ideal. (all my updates, inserts, deletes, and selects use that to access this table).

I added the index and my reran the query:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(2 rows affected)
Table 'ErrorTable'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Note the number of logical reads dropped by about a factor of 1000. My elapsed time dropped from 15 ms to 0 ms (or rather less than .05 ms so SQL Server rounded down).

If we look at the graphical query plan results we something similar:

First, without the index:

Trust_but_Verify_Query Table Scan

Table scan to find 2 rows

Trust_but_Verify_Query Table Seek

Table Seek to find 2 rows

That’s nice, I now know I’m doing a seek rather than a scan, but is that enough? I mean if the ErrorTable only has 2 rows, a seek is exactly the same as a scan!

So let’s dig deeper:

Trust_but_Verify_Query Table Scan Details

Query plan showing details for a scan

Trust_but_Verify_Query Table Seek Details

Query plan showing details for a seek

Here you can definitely see the dramatic improvement. Instead of reading in over 100,00 rows (at a bit over 2.5 KB per row, or over 270MB) we only need to read in 2 rows, for a total of just over 5 KB of data.

Now wonder it’s faster. In fact, in the ETL process where it was originally taking about 1 minute to process 1000 rows, my query with the index was now executing 3000 rows in under 10 seconds.

The above is a bit of a contrived example, but it’s based on actual performance tuning I did last week. And this isn’t meant to be a lesson in actual performance tuning, but more to show that if you make a chance (in this case adding an index) you can’t just trust it will work, but you should VERIFY that it has made a difference, and more importantly, that it makes a difference for your workload. I’ve seen GTA often make valid, but useless index suggestions because someone ran an uncommonly used query against it and assumed the recommendation was good. Or, they’ve made assumptions about the size of the table.

So never just trust an index will help, but actually VERIFY it will help.

 

Giving Blood and Pride Month

I gave blood yesterday. It got me thinking. First, let me show a few screenshots:

male blood donor shot 1

7 Male Donor #1 screen shot

female blood donor shot 1

Female Donor #1 screen shot

Let me interject here I’m using the terms Male and Female based on the criteria I selected in the American Red Cross’s Fast Pass screen. More on why I make that distinction further on. But first two more screen shots.

female blood donor shot 2

Pregnancy question highlighted for female

male blood donor shot 2

No pregnancy question for males

Now, on the face of it, this second set of questions especially almost seems to make sense: I mean if I answered Male early on in the questionnaire, why by asked about a pregnancy? But what I’m asked at the beginning is about my gender, not my actual child-bearing capability. Let me quote from Merriam-Webster:

2-b: the behavioral, cultural, or psychological traits typically associated with one sex

Or from the World Health Organization:

Gender refers to the roles, behaviours, activities, attributes and opportunities that any society considers appropriate for girls and boys, and women and men. Gender interacts with, but is different from, the binary categories of biological sex.

Who can be pregnant?

So above, really what the Red Cross is asking isn’t about my gender, but really my ability to be pregnant. Now, this is a valid medical concern. There are risks they want to avoid in regards to pregnant women, or recently pregnant women giving blood. So their ultimate goal isn’t the problem, but their initial assumption might be. A trans-man might still be able to get pregnant, and a trans-woman might be incapable of getting pregnant (as well as a cis-woman might be incapable.) And this is why I had the caveat above about using the terms male and female. I’m using the terms provided which may not be the most accurate.

Assumptions on risk factors

The first set of images is a problematic in another way: it is making assumptions about risk factors. Now, I think we can all agree that keeping blood borne pathogens such as HIV out of the blood supply is a good one. And yes, while donated blood is tested, it can be even safer if people who know they are HIV or at risk for it can potentially self-select themselves out of the donation process.

But…

Let me show the actual question:

Male Male 3 month contact question

Question 21, for Men

This is an improvement over the older restrictions that were at one year and at one point “any time since 1977”. Think about that. If a man had had sex with another man in 1986, but consistently tested negative for HIV/AIDS for the following 30+ years, they could not give blood under previous rules. By the way, I will make a note here that these rules are NOT set by the American Red Cross, but rather by the FDA. So don’t get too angry at the Red Cross for this.

The argument for a 3 month window apparently was based on the fact that HIV tests now are good enough that they can pick up viral particles after that window (i.e. at say 2 months, you may be infected, but the tests may not detect it.)

Based on the CDC information I found today, in 2018, male-to-male sexual contact resulted in 24,933 new infections. The 2nd highest category was heterosexual contact (note the CDC page doesn’t seem to specify the word sexual there.) So yes, statistically it appears male-male sexual contact is a high-risk category.

But…

I know a number of gay and bisexual men. I don’t inquire about their sexual habits. However, a number are either married or appear to be in monogamous relationships. This means if they want to give blood and not lie on the forms, they have to be celibate for at least 3 months at a time!  But hey if you’re a straight guy and had sex with 4 different women in the last week, no problem, as long as you didn’t pay any of them for sex! I’ll add that more than one gay man I know wants to give blood and based on their actual behavior are in a low risk category, but can’t because of the above question.

Why do I bring all this up at the end of Pride Month and what, if anything does it have to do with database design (something I do try to actually write about from time to time)?

As a cis-het male (assigned at birth and still fits me) it’s easy to be oblivious to the problematic nature of the questions on such an innocuous and arguably well-intended  form. The FDA has certain mandates that the Red Cross (and other blood donation agencies) must follow. And I think the mandates are often well-intended. But, there are probably better ways of approaching the goals, in the examples given above, of helping to rule out higher-risk donations. I’ll be honest, I’m not always sure the best way.  To some extent, it might be as simple as rewording the question. In others, it might be necessary to redesign the database to better reflect the realities of gender and sex, after all bits are cheap.

But I want to tie this into something I’ve said before: diversity in hiring is critical and I think we in the data world need to be aware of this. There are several reasons, but I want to focus on one for now.

Our Databases Model the World as We Know It.

The way we build databases is an attempt to model the world. If we are only aware of two genders, we will build our databases to reflect this. But sometimes we have to stop and ask, “do we even need to ask that question?” For one thing, we potentially add the issue of having to deal with Personally Identifiable Information that we don’t really need.  For another, we can make assumptions: “Oh they’re male, they can’t get pregnant so this drug won’t be an issue.”

Now, I’m fortunate enough to have a number of friends who fall into various places on the LGBTQIA+ (and constantly growing collection of letters) panoply and the more I listen, the more complexity I see in the world and how we record it.

This is not to say that you must go out instantly and hire 20 different DBAs, each representing a different identity. That’s obviously not practical. But, I suspect if your staff is made up of cis-het men, your data models may be suffering and you may not even be aware of it!

So, listen to others when they talk about their experiences, do research, get to know more people with experiences and genders and sexualities different from yours. You’ll learn something and you also might build databases. But more importantly, you’ll get to know some great people and become a better person yourself. Trust me on that.

 

 

 

The Customer is Always Right?

You’ve heard this adage before. Some often believe it. And honestly, there’s a bit of truth to it. But the truth is, the customer pays your bills and if they stop paying your bills, they’re no longer your customer.

I was reminded of this actually during the testimony of Dr. Fauci before the Senate a few weeks ago. To be clear, neither the Senate nor the CDC is a customer here, nor is the President of the United States. But, I think it ties into the thesis I want to make.

Over the past few months there’s been a lot of discourse over whether states should shut down, for how long and how and when they should open up. At the extreme ends you have folks who seem to argue for a continual shutdown to save as many lives as possible and the people who seem to argue that the economy is far more important and that any shutdown is a bad idea.

Dr. Fauci has been accused of wanting to “quarantine the entire country” and is the subject of a hashtag campaign, #faucifraud. During the Senate hearing, Senator Rand Paul took several swipes at Dr. Fauci including a statement implying that some people might treat Dr. Fauci as the end-all. Finally, with only 32 seconds left, Dr. Fauci gave his reply:

“Well, first of all, Senator Paul, thank you for your comments. I have never made myself out to be the end all and only voice in this. I’m a scientist, a physician, and a public health official. I give advice, according to the best scientific evidence. There are a number of other people who come into that and give advice that are more related to the things that you spoke about, about the need to get the country back open again, and economically. I don’t give advice about economic things. I don’t get advice about anything other than public health. So I wanted to respond to that.”

I think this was an incredible reply and one that I think behooves any consultant to keep in mind. Dr. Fauci politely but firmly refutes Senator Paul’s comment about being the end all and then points out what his qualifications are. He then suggests that there are other experts, in other fields, who should be consulted. He then reiterates the limitations of his advice.

As a consultant, this mirrors my own experience. A client may ask me to recommend a HA/DR strategy for them. I might go ahead and recommend some sort of Always On Availability Group with multiple synchronous replicas in one data center and then an asynchronous replica to a second data center. I might then recommend daily backups to tape with the tapes taken off-site. Everything would be encrypted and we would test failovers on a regular basis. With that, the proper selection of hardware, a proper deployment setup, and a completely developed runbook for various scenarios, I could probably guarantee them nearly perfect uptime.

Then, the CFO steps in and points out that their budget is only 1/20th of what I had planned around and that trying to spend more would bankrupt the company.

Then the VP of Sales points out that the business model of the company is such that in reality, they could operate for several hours of downtime and while it might hurt business a little bit, it wouldn’t bring them to a halt.  In fact, they suggest that the order system just be done with a bunch of Excel spreadsheets that accounting can true up at the end of the month. After all, they just want to focus on sales, not on entering data into the system.

Finally the CEO steps in. They decide that it’s true, the company can’t afford a 24/7 HA/DR setup that is the envy of NASA, at least not at this time. Nor do they think the VP of Sales plan has much merit since it won’t allow future growth into online sales and while it might be easier for the salespeople to just jot down stuff, it would mean hiring more people in accounting to figure out the data at the end of each month.

Instead, they direct the CTO to work with all the parties involved to develop a system that can have 3 hours of downtime, but costs no more than 1/10th of what I proposed, and  that also incorporates features that allow them to move to a more advanced HA/DR setup down the road and also will eventually allow for online sales.

So who was right? Me at my extreme of a huge investment in hardware, licenses and resources, or the VP of Sales who wanted to do the whole thing using some Excel spreadsheets.

Both and neither. Either of our ways would have worked, but neither was the best solution for the company.

I think good experts realize exactly where their expertise begins and ends. My role as a consultant is to provide the best advice I can to a company and hope that they take my advice, at least as it is applicable. And I should understand that every situation is different. In these cases, the customer is always right. Their final decision might not be what recommended, but ideally they’ve taken it into consideration.

Finally though, I have to recognize that there are situations where I may have to withdraw myself from the situation. In the above scenario, I crafted a situation where compromise is not only a viable option, it is perhaps the best option. But there are times when compromise is not an option. If a potential client came to me and they were dealing with PII data and refused all advice in regards to encrypting data and other forms of data security, it would be in my best interests to simply say, “here, the customer is wrong” and recuse myself. So in some cases, the customer may be outright wrong, but they should also stop being my customer at that point and would no longer be paying me.

So, do we re-open the country completely or do we shutdown everything until the fall? Honestly, I’m glad I’m not the one making that decision. I don’t think there’s a single answer for every community. But I think the best leaders will take into account the best advice they can from a variety of experts and synthesize the best answer that they can and adjust it as more data and experience come to light. It’s simply not practical to prevent every possible COVID-19 death. But it’s also not ethical to re-open without a plan or even thought as to the impact.

Neither extreme is fully right.

 

Advanced Braining

I’m currently reading the tome The Power Broker by Robert Caro. For those not familiar with it, it’s the Pulitzer Prize winning biography of Robert Moses. “Robert who?” you may be asking? Robert Moses, perhaps more than any single person literally shaped New York City in the mid-20th Century. Due to his power, he was responsible in NYC alone, for getting the Triborough Bridge, Brooklyn-Battery Tunnel, West Side Highway, Cross-Bronx Expressway, and many other large scale projects built. He outlived a number of borough presidents, mayors, governors and even Presidents. Arguably, for decades he was the most powerful man in NYC, at least in terms of how many was spent and what projects were completed. In many ways he was a visionary.

However, as the chapter I’m currently in discusses, he also could be extremely short-sighted. I’ll come back to that in a moment.

In the past week, several small incidents occurred in my life. Separately, they don’t necessarily mean much, but taken together, I realized there was a little theme associated with them.

Last Tuesday I posted an update on my dryer repair and an issue at one of my clients. I described the work incident as an example of the normalization of deviance. A few hours later, someone I’ve known for decades, originally online, but have since met in person, Derek Lyons (who has a great blog of his own on anime, a subject about which I know nothing) posted a reply to me on Facebook and said he had read my article, liked it, but thought I was wrong. I was intrigued. You can see his comment and my reply at the bottom of last week’s post. The general point though is I think he showed my thinking was incomplete, or at least my explanation was. In either case, it made the overall article a better one.

Then on Wednesday, my editor at Redgate, Kathi Kellenberger  emailed me with changes to my most recently submitted article. One of the changes was to the title of the article. Now, I’ve come to value Kathi’s input, but I wasn’t keen on the title change, so I suggested something different. She wrote back and recommended we go with hers, How to Add Help to PowerShell Scripts because she said “How to…” generates more hits on search engines and in fact a previous article of mine How to Use Parameters in PowerShell was one of their most read articles at the time (106K hits and climbing). I went with her advice.

Yesterday, a friend contacted me. He was in the middle of doing grading for his students and the numbers on his Excel spreadsheet weren’t quite making sense. The errors weren’t huge, but just enough to make him go “hmmm”. So, he reached out to me to take a look. After a few minutes of digging I understood what was happening and able to write back to him and give him a better solution.

All these have something in common: the final product was better because of collaboration. This is a common theme of mine: I’ve talked about the chat system I use at RPI, I’ve talked about making mistakes. In general, I think that when trying to solve a problem, getting additional input is often valuable.

So back to Robert Moses. In the early part of his career, before his efforts focused mostly on NYC itself, he was responsible for other projects, such as the Northern State Parkway and the Southern State Parkway and Jones Beach on Long Island. He started his career in a time when cars were mostly a vehicle of the well-off and driving a parkway was expected to be a pleasant experience (hence the name). His efforts were built around more and more parkways and highways.

By the 1950s though, it was becoming apparent to most everyone else that additional highways actually generated more traffic than they routed away from the area surface roads. What was originally considered a blessing in disguise, where a bridge, such as the Triborough would quickly generate more traffic (and hence more tolls) than expected, was soon seen as a curse. For every bridge or tunnel built in or around NYC, traffic increased far more than expected. And this came at a price. Urban planners around the country were starting to see the effects. Efforts to build more bridges or highways to ease traffic congestion were actually creating more. Even in NYC as Moses was planning for his next large projects opposition was slowly building. However, Robert Moses was blind to the problem. By the 1950s and 60s he had so surrounded himself by “yes men” that no dissidence was permitted. In addition, opposition outside of this offices was silenced by almost any means Moses could use, including apparently the use of private detectives to dig up dirt on opponents.

In the current chapter I’m reading, Caro, the author, details exactly how much money the Triborough Bridge Authority (which was in practice, though not theory, under Moses absolute control) and the Port Authority had available for upcoming projects, including the planned Verrazzano-Narrows Bridge. He goes on to explain how badly the infrastructure of the NY Subway system and the LIRR had fallen into disrepair. Caro suggests how much better things could have been had just a portion of the money the TBA and PA had at their authority had been spent on things like the Second Avenue Subway (something that is only now coming to fruition and will take possibly decades more to complete). Part of the issues with the subway system can be lain directly at the feet of Moses due to earlier efforts of his to get the city to fund his other projects. The issues with the LIRR however were more an indirect result of his highway building out into Long Island.

I suspect some of Caro’s claims are a bit idealistic and would have cost more than the projections at the time (like most projects) and while I think most of the projects he touches upon probably should have been built in the 50s (the Second Avenue Subway being one of them and the LIRR East Side Access being another) they weren’t because of a single man who brooked no disagreement and was unwilling to reconsider his plans.

Robert Moses was a man who got things done. Oftentimes that’s a good thing. And honestly, I think a number of his achievements are remarkable and worthy of praise.

But I have to wonder, how much better of a city could New York be, had Robert Moses listened to others, especially in the 1950s and 60s.

Today’s takeaway? Take the time to listen to input and ask for it. You may end up with a better solution in the long wrong.

 

 

 

Bits are cheap

And, as unfortunately as a recent incident in our #SQLFamily community illustrated, apparently at times so is respect.  Bear with me as I relate these two ideas and another incident.

Let me start with a statement that should make more sense by the end of this post: My name is Gregory, but I prefer that you call me Greg. My pronouns are he/him/his.

But first a trip down memory lane. Many of us recall the Y2K issue. This was a direct result of programmers decades ago trying to save bytes in storage (and to a lesser extent memory and CPU cycles) because storage was expensive. By storing dates as just the last two digits of the year, they could cut the storage for years in half. This was important back then because it saved money. But, as many of us recall, as the year 2000 approached, this started to cause more and more problems. (As a point aside, the first example I’m aware of was brought to my attention by a programmer who worked for a bank in 1970. Seems as if they suddenly had issues handling 30 year mortgages!)

Since then of course the cost of storage has dropped and as an industry we’ve moved to storing years as a 4-digit year. No one in today’s day and age would normally question this decision.

But enough of ancient history, let me get to the point of this article: respecting others.

As many readers know, those of us on Twitter will often use the hashtag #SQLFamily.  In the past week I’ve seen two incidents that have illustrated the worst and the best of this family.

In the first case, a member of the community, a woman I had never met, said she was leaving the family, she no longer felt welcome. At an event she had been misgendered not once, but multiple times. For those who aren’t sure what that means, I will, without going into background or details (because they’re not important) say she is a trans-woman. Several people at the event took it upon themselves to refer to her using by male pronouns.

In the most recent case, a fellow speaker, Cathrine Wilhemsen tweeted about how she had been addressed as Cathi and Kathi twice in the previous 24 hours. She says this hasn’t been the only time, but just the most recent and recent enough for her to comment on.

In both cases, part of the problem is that strangers addressed the person in question in a manner that did not respect them; in the first case by not using the proper pronouns and in the second by not using her provided name.

But that’s one part of the problem.  So let’s address that: we have members of the #sqlfamily who don’t respect other members. But, we have another issue, and one that I think is important to address: those who minimize the issue. In the first case, apparently no one called out the folks misgendering the woman.  In a situation like this, a show of support can be as simple as saying something like, “Umm, I think you mean she, not he.”  You can also support the use of pronouns on nametags at events or in the bio descriptions for events.

Remember though, today, bits are cheap. So we can do more. Don’t design your database with a bit field for gender. Make it a table. These are relational databases after all. Have a table for possible gender identifications. Allow for a method to add rows to this table. Have a table for pronouns.  There’s more than you might think and people are often crafting additional ones. While the singular they/them is becoming more popular, it’s NOT the only alternative to he/him, she/hers.

We are data professionals after all. We absolutely should not lock our data into a single view of the world if that worldview is changing. (Note, the world is not changing, there have been multiple genders throughout recorded history.  We’re simply becoming more cognizant of it now.)

In the case of Cathrine being called by another name, keep it simple. Use the name provided, be it in an introduction, on the nametag or other method. Respect the person’s wishes. And do not, as some did on Twitter respond by “well they probably didn’t mean anything” or “eh, just roll with it.” It’s not YOUR name. It’s not YOUR identity. Sure, you might not care if someone calls you Richard, Rick, Ricky or Dick. But another person might. Their name is part of their identity, respect their wishes.  I will add one more note that Cathrine shared with me and that other women have shared with me, it is almost always men that will use nicknames or cute names or similar without prompting.  Yes, fellow men, I’m calling you out. We may not think about it. In fact I would argue we often don’t think about it. It’s something that privilege allows us. But be aware that your attempt to be friendly or familiar is actually often coming off as diminishing and condescending.

Now, despite the failure of some members of #SQLFamily, I want to celebrate the great people in the community. These two incidents have created a lot of responses. I’ve seen at least two great posts, one from Jen McCown and another from Kellyn Gorman. I’m sure there are others. I also have written in the past about being an ally. But in addition, while I’ve seen one or two tweets that have dismissed Cathrine’s tweet, I’ve seen many members rally to the defense of the women in both incidents. And, also very importantly, I’ve seen several tweets from people asking, “how can I help?” or “how can I improve my behavior?” I love that last one. I’m constantly trying to unlearn some of the behaviors I was taught and to be more conscious of what being a white, straight cis-het male brings to the table. We can always learn to do better.

Yes, our #SQLFamily has some members who could and need to do better. That saddens me. Fortunately as I’ve seen, it also has a lot of members actively striving to do better and help others do better. That gladdens me. Let’s all be the latter.

Respect and disk space don’t cost us much. Let’s learn to be respectful of people and to design databases that can also respect the world around us.

P.S. I want to note, I was purposely vague about the first incident because the specifics weren’t important and I did not want to draw more attention to a specific person without their permission. In Cathrine’s case, I made a point of respecting her and exchanged messages with her first to make sure she was ok with me bringing more attention to the incident.

This Site Makes Cookies

Apparently under new guidelines here and in Europe I’m ethically obligated that I’ve been known to make cookies from time to time.  Oh, excuse me, something is coming in to my earpiece now.  Oh, never mind, I’ve been informed those laws apply to a different type of cookie.

In any event, I first got into the habit of baking cookies on a somewhat regular basis while in college. It became a stress release for me, and also apparently made me quite popular among the sorority sisters and outing club members I lived with.  I would, probably at least once a month my sophomore year make a double-batch of Tollhouse Chocolate Chip cookies. They rarely lasted more than a day or two.

Since then, I’ve expanded my repertoire, including once trying “bacon cookies” for my very first SQL Saturday. Those weren’t a huge hit, but haven’t stopped me from baking.

That said, I’ve learned a few things over the years about baking cookies. For example, my daughter would bring cookies to school for an event and would often be asked, “oh did your mom make them?”  She’d patiently explain that no, her dad did. Even today, the assumption is that when it comes to school events, the mom does the baking. I’m glad that my kids both realize that it’s unfair to expect that mothers have to do all the baking and other domestic duties.

But, I also learned something else that sort of threw me for a loop. People don’t like homemade cookies from a zip-lock bag.  Sometimes I’d bring cookies to events and people wouldn’t eat many of them. Now, being practical and in a hurry, I’d almost always just toss the cookies into a zip-lock bag.  It was my daughter who suggested I start putting them into a plastic container with a lid instead. Suddenly I found the same cookies were much more popular. My daughter explained her theory, which I tend to believe. For whatever reason, perhaps hygiene, people don’t want to reach into plastic bags for food. It may be touching the same sides that everyone else did or something else. But regardless, putting them into plastic tub with a container works.

Call it a UI problem, but, it seems to work.

Today’s take-away, just because you’re comfortable with a solution and think it works, don’t be adverse to making changes, even if they seem silly or trivial, if that’s what your users desire.

P.S.: Check out my latest writing for Red-Gate: PowerShell and Secure Strings.

Two Minds are Better Than One

I’m going to do something often not seen in social media. I’m going to talk about a mistake that I made. It’s all to common on various certain media sites to talk about how perfect our lives are and how great things are. You rarely hear about mistakes. I decided, in the theme of this blog of talking about how we approach and solve (or don’t solve) problems, I’d be up front and admit a mistake.

This all started with a leak in the downstairs shower. It had been growing over the years and I frankly had been ignoring it.  Why put off to tomorrow what you can put off to next month or even year? But finally, in December of last year it became obvious that it was time to fix the leak. It had continued to grow, and now that my son was home from college for extended break, he had setup a work area in the basement, below the bathroom.  I figured he didn’t really need to suffer from water dripping onto his desk.

So, he and I went into full demolition mode and ripped out the old tile and backer board to get to the plumbing.

New plumbing in bathroom

New plumbing

You can see some of the work here. I also took the opportunity to run wiring to finally put in a bathroom fan. That’s a whole other story.

Anyway, the demolition and plumbing went well. Then we put up the backer board and sealed it. And left it like that. It didn’t look good, but it was waterproof and usable. It was “good enough”. So for about 8 months it sat like that. But with an upcoming pool party, I decided it was time to finally finish it off. One of the hold ups had been deciding on tile. Fortunately, on a shopping trip about a month earlier, my son, my wife and I found tile we liked (my daughter, who ironically still lives at home and will be using the shower more in the next few years than her brother, was in LA on vacation, so she ended up not really having much say in the matter).

So, earlier this month, while the rest of my family took a weekend to go to Six Flags New Jersey, I figured I’d surprise them with finally tiling the shower.  I went to the big box store whose favorite color is orange and bought the required materials. Since tile we had selected is approximately 6″ wide and 24″ long, I had to make sure I got the right mastic.  This was a bit different from stuff I’ve worked with in the past with a bit more synthetic materials in it and it mixed differently, and had slightly different drying characteristics. That, combined with growing darkness lead me to move quickly. The darkness was a factor since any tile cutting I was doing was outside.

And, all that lead to a simple mistake.  On the end wall, there’s a window and as a result part of that wall needed tiles just less than 24″ long. On one hand, this is a huge plus since it means less seams and less places to grout. On the other, it meant in one spot having to work around the trim of the window. And that’s where I made my mistake.  The trim had been put over the original tile, so there was in theory room behind the stool of the window to fit in a piece of tile. That had been my plan.

But, when it came to sliding in the nearly 24″ long piece of tile, it wouldn’t fit. It wouldn’t bend (obviously) to let me get it tucked behind the stool and due to the stickiness of the mastic, I couldn’t slid it in from the top.

So, I cut out a notch. In the back of my mind I somehow was thinking, that it wouldn’t look that bad and tile would cover it.  Well, I was obviously wrong.

In hindsight, I realized I should have cut the tile in two pieces, created an extra seam (like the row below it that had to cover more than 24″ wide in any event) and then I could have slide in the smaller piece and then put in the remaining piece. It would have been perfect, looked great and more likely to be waterproof.

So, this gets me to the title. Had I been doing the work with someone else, I’m sure I’d have said, “Damn, this is gonna suck, any ideas?”

A mistake

My mistake

And I’m sure someone else would have suggested, “Hey, cut the piece and slide it in.”

I like working alone, but sometimes, you need a second person to help. Or more than one brain as I’ve mentioned in the past.  If nothing else, sometimes a duck can help.

Idera Ducks!

A bunch of rubber ducks, including two from Idera!

So, the moral of the story is sometimes two heads are better than one. Oh well, I won’t be the one using that shower, so I won’t see my mistake all the time!

Oh and check out my latest Red-Gate Article on Secure Strings in PowerShell.