Caving and SQL

Longtime readers know that I spend a lot of my time talking about and teaching caving, more specifically cave rescue, and SQL Server, more specifically the operations side. While in some ways they are very different, there are areas where they overlap. In fact I wrote a book taking lessons from both, and airplane crashes to talk about IT Disaster Management.

Last week is a week where both had an overlap. One of the grottoes in the NSS (think like a SQL User Group) sponsored a talk on Diversity and Inclusion in the caving community. The next day, SQL Pass had a virtual panel on the exact same subject.

Welcoming

Let me start with saying that one thing I appreciate about both communities is that they will welcome pretty much anyone. You show up and ask to be involved and someone will generally point you in the right direction.  In fact several years ago, I heard an Oracle DBA mention how different the SQL community was from his Oracle experience, and how welcoming and sharing we could be.

This is true in the caving community. I recall an incident decades ago where someone from out of town called up a caving friend he found in the NSS memberhsip manual and said, “hey, I hear you go caving every Friday, can I join you?” The answer was of course yes.  I know I can go many places in this country, look up a caver and instantly be pointed to a great restaurant, some great caves and even possibly some crash space to sleep.

So let’s be clear, BOTH communities are very welcoming.

And I hear that a lot when the topic of diversity and inclusion comes along. “Oh we welcome anyone. They just have to ask.”

But…

Well, there’s two issues there and they’re similar in both communities. The less obvious one is that often anyone is welcome, but after that, there’s barriers, some obvious, some less so. Newcomers start to hear the subtle comments, the subtle behaviors. For example, in caving, modesty is often not a big deal. After crawling out of a wet muddy hole, you may think nothing of tearing off your clothes in the parking lot and changing. Perhaps you’re standing behind a car door but that’s about it. It’s second nature, it’s not big deal. But imagine now that you’re the only woman in that group. Sure, you were welcomed into the fold and had a blast caving, how comfortable are you with this sudden lack of modesty? Or you’re a man, but come from a cultural or religious background where modesty is a high premium?

In the SQL world, no one is getting naked in the datacenters (I hope). But, it can be subtle things there too. “Hey dudes, you all want to go out for drinks?” Now many folks will argue, “dudes is gender neutral”. And I think in most cases it’s INTENDED to be. But, turn around and ask them, “are you attracted to dudes?” and suddenly you see there is still a gender attached.  There’s other behaviors to. There’s the classic case of when a manager switched email signatures with one of his reports and how the attitudes of the customers changed, simply based on whose signature was on the email.

So yes, both groups definitely can WELCOME new folks and folks outside of the majority, but do the folks they welcome remain welcomed? From talking to people who aren’t in the majority, the answer I often get is “not much.”

An Interlude

“But Greg, I know….” insert BIPOC or woman or other member of a minority.  “They’re a great DBA” or “They’re a great caver! Really active in the community.”  And you’re right. But you’re also seeing the survivorship bias. In some cases, they did find themselves in a more welcoming space that continued to be welcoming. In some cases you’re seeing the ones who forged on anyway. But think about it, half our population is made up of women. Why aren’t 1/2 our DBAs?  In fact, the number of women in IT is declining! And if you consider the number of women in high school or college who express an interest in IT and compare it to those in in their 30s, you’ll find the number drops. Women are welcome, until they’re not.

In the caving community during an on-line discussion where people of color were speaking up about the barriers they faced, one person, a white male basically said, “there’s no racism in caving, we’ll welcome anyone.”  A POC pointed out that “as a black man in the South, trust me, I do NOT feel safe walking through a field to a cave.”  The white man continued to say, “sure, but there’s no racism in caving” completely dismissing the other responder’s concerns.

There’s Still More…

The final point I want to make however is that “we welcome people” is a necessary, but not sufficient step. Yes, I will say pretty much every caver I know will welcome anyone who shows an interest. But that’s not enough. For one thing, for many communities, simply enjoying the outdoors is something that’s not a large part of their cultural.  This may mean that they’re not even aware that caving is a possibility. Or that even if it is, they may not know how to reach out and find someone to take them caving.

Even if they overcome that hurdle, while caving can be done on the cheap, there is still the matter of getting some clothing, a helmet, some lights. There’s the matter of getting TO the cave.

In the SQL world, yes anyone is welcome to a SQL Saturday, but what if they don’t have a car? Is mass transit an option? What if they are hearing impaired? (I’ve tried unsuccessfully 2 years in a row to try to provide an ASL interpreter for our local SQL Saturday. I’m going to keep trying). What if they’re a single parent? During the work week they may have school and daycare options, but that may not be possible for a SQL Saturday or even an afterhours event. I even had something pointed out to me, during my talk on how to present, that someone in the audience had not realized up until I mentioned it, that I was using a laser pointer. Why? Because they were colorblind and never saw the red dot. It was something that I, a non-colorblind person had never even considered. And now I wonder, how many other colorblind folks had the same issue, but never said anything?

In Conclusion

It’s easy and honestly tempting to say, “hey, we welcome anyone” and think that’s all there is to it. The truth is, it takes a LOT more than that. If nothing else, if you’re like me, an older, cis-het white male, take the time to sit in on various diversity panels and LISTEN. If you’re invited to ask questions or participate, do so, but in a way that acknowledges your position. Try not to project your experiences on to another. Only once have I avoided a field to get to a cave, because the farmer kept his bull there. But I should not project MY lack of fear about crossing a field onto members of the community who HAVE experienced that.

Listen for barriers and work to remove them. Believe others when they mention a barrier. They may not be barriers for you, but they are for others. When you can, try to remove them BEFORE others bring them up. Don’t assume a barrier doesn’t exist because no one mentions it. Don’t say, “is it ok if I use a red laser pointer?” because you’re now putting a colorblind person on the spot and singling them out. That will discourage them. For example find a “software” pointer (on my list of things to do) that will highlight items directly on the screen. This also works great for large rooms where there may be multiple projection screens in use.

If caving, don’t just assume, “oh folks know how to find us” reach out to community groups and ask them if they’re interested and offer to help. (note I did try this this year, but never heard back and because of the impact of Covid, am waiting until next year to try again.)

Don’t take offense. Unless someone says, “hey, Greg, you know you do…” they’re not talking about you specifically, but about an entire system. And no one is expecting you to personally fix the entire system, but simply to work to improve it where you can. It’s a team effort. That said, maybe you do get called out. I had a friend call me out on a tweet I made. She did so privately. And she did so because, she knew I’d listen. I appreciated that. She recognized I was human and I make mistakes and that given the chance, I’ll listen and learn. How can one take offense at that? I saw it has a sign of caring.

Finally realize, none of us are perfect, but we can always strive to do better.

So, today give some thought about how you can not only claim your community, whatever it may be, is welcoming, but what efforts you can make to ensure it is.

 

On a separate note, check out my latest writing for Red-Gate, part II on Parameters in PowerShell.

Let me Try this… in Prod

A more light-hearted look at things today. There are certain phrases or ideas you hear that should give you pause. There’s the classic, “here, hold my beer and watch this.”

And of course what I heard yesterday while on a call with a client and their developer said, “well let me try this, what’s the worst that could happen?”

Just the other day, fellow #SQLFamily member David Klee tweeted:

A software vendor just told my client to restart their SQL Server after every backup. I am beyond speechless.

I read that tweet and literally sat there slack-jawed for half a minute. I swear I felt a disturbance in the Force as a million DBAs cried out in terror.

But this got me thinking of other bad advice I’ve seen over the years, such as “we reboot our SQL Server nightly because it has a memory leak and uses all the memory on our server. Oh and by the way, can you tell us why our server is so slow in the morning?” (Ok the 2nd sentence is partly made up, but I’ve had clients complain about performance issues which were due in part to them restarting their SQL Server.)

Or, “don’t index, keep everything as a heap.”  Yes, I saw that someplace, I’m still not quite sure the reason they had for that.

“Oh, we had a problem with this stored procedure sometimes running really slowly, so we hardcoded WITH RECOMPILE in. Now it runs consistently.” Fortunately by the time I had arrived they had stopped this particular process, instead they just had  scheduled task that recompiled it at least once a day. This one was interesting. After determining a couple of performance issues with this sproc, including parameter sniffing and using SQL Server 2005 XML parsing in it, I developed a far better solution that eliminated the parameter sniffing and in most cases eliminated the need to parse the XML at all. The client didn’t adopt it. A year later, a DBA brought in for another project took a stab at this sproc and came up with a similar solution (though he didn’t reduce the XML parsing like I did.). They didn’t adopt it. Finally, over 2 years after my initial recommendation, I was able to convince them to implement it.

“Oh, I have covering indexes for every column and include all the other columns!” Ok, I haven’t seen it quite this bad, but I’ve seen indexing that approached this level.

“We use NOLOCK, because it’s faster.” This is a common one. Now, I’ll admit years ago on a platform we built we did use NOLOCK because it was faster, BUT we also actually understood what it could do and honestly didn’t care about inconsistent results (we were serving up classified ads, so if you saw a different set on a page refresh, it was actually a useful side effect.)

In general, I find bad advice or bad ideas fascinating. I love to understand how they came into being. In some cases, like my employer’s use of NOLOCK, it was actually a conscious choice and we understood the drawbacks and accepted them. I wouldn’t necessary call that a bad idea once all the particulars were known. But on the face of it, it certainly looked like a bad idea, and that was SQL 7.0 and later SQL 2000. With more modern versions of SQL Server, I would argue there are betters solutions now.

In the case of my former client doing the RECOMPILE, that’s more subtle. Yes, their solution worked, but, it was clearly the wrong solution because they didn’t understand what the problem was, or how to fix it properly. So I’d argue this was a bad idea.

But when it comes to restarting SQL Server after a backup, I really still have no words. It’s not clear to me at all what problem the vendor thought they were solving or why this would solve it. This truly is a bad idea all around.

Fortunately, in the particular case of my client and their developer, the worst was, we’d resend 25,000 rows of data to Salesforce to be updated. That would take 2 minutes and not break anything. He knew this and was joking, but it was funny to hear.

So my question to my readers is: what’s the worst idea or advice you’ve heard and did it in retrospect have enough context to perhaps at lease explain why someone came up with it, or was it simply so bad you’re still shaking your head? This doesn’t have to be SQL related.

P.S. – my next Redgate article should be published soon. Keep your eyes open for it!

Covid Challenges

There’s no doubt that Covid-19 has had a huge impact on our lives. Professionally for example it means a change to a virtual SQL Saturday and a Virtual PASS Summit. It means some of my fellow #SQLFamily have gone radio silent for various periods of time as they’ve dealt with this crisis in their own fashion.

I know personally there are days I just want to go outside and scream. There’s so much disruption in the world and in my life. I miss being able to travel freely, to see as many friends in person as I’d like and so much more. I mourn the loss of schooling as we know it for my kids and everyone else’s kids.

20200313_183236

Early on during the pandemic – bare shelves where toilet paper and paper towels should be

But, I’ve also been very fortunate. I’ve had a few friends who have contracted Covid-19, but all have survived (though as we’re learning, surviving may include long-term impacts such as irregular heart rhythms, changes to mental status and drug uptakes and more). I know of one former co-worker who succumbed to the disease when it was in the middle of the NYC spike. Other than that, fortunately, the only deaths I’m aware of have been friends of friends. This doesn’t make the disease any less tragic, but just a bit more remote for me.

But, the above could be said of most of us and rather than focus on the negatives, I wanted to talk about some of the changes in my life this year and how I’ve tried to rise to the challenge. Before I do so, I want to be clear, that how one rises to the challenge is different for everyone and this is not meant to be a brag sheet as much as a statement of what I see as some positive things in my life. I’d love to hear some positivity from YOUR life in my comments. We all know how bad this year has been, let’s talk about some good stuff for a change.

Backpacking

One of my goals for decades has been to hike the Appalachian Trail. No, I can’t say this was the year. But, I had section hiked a portion from my Dad’s house in CT up through Dalton MA while in college and then a year or two later from Bennington VT up through Manchester VT (I’m still grateful to the poor soul who picked up my friend and I have 2 days of sweaty exertion without clean clothes!) But this meant there has been a gap between Dalton MA and Bennington VT.  Last month I was able to FINALLY put a backpack on and close part of that gap.  I’d love to post photos, but silly me left my cell phone in my car! So here’s one after I got back to my car.

Do I look tired?

After hiking 17 miles during a heat wave

I hope to get in the final 20 mile stretch in the coming weeks. This will mean that I can check Massachusetts off my list of states to hike for the AT.

Sourdough

Ok, show of hands, who here has dabbled in sourdough during this pandemic? I know I have. I kept a starter going for about 4 months before taking a break from it over the summer. I made a number of loafs of bread as well as some sourdough waffles (and I’ll admit sacrificed the first batch more than once to the Waffle-Iron gods.) I even added a bit to some homemade pizzas. Tasty stuff and to be honest, I’ll probably do another starter again come fall. I’ve always loved baking so this was just an extension of it.

20200505_120934

Roast Beef Sandwich with homemade sourdough bread!

20200725_190323

Two sourdough pizzas with home grown herbs

More Time with Family

Of course making that much pizza and waffles means I need someone to help eat it. Fortunately we have a full house this summer (and will this fall). So an upside has been more time with the family. Among other things, this meant, especially early on during the pandemic, more family walks in the area.

20200411_145101

Family walk near the house with leaden skies

Bicycling

Of course all that delicious food I’ve been making needs to be burned off. I’ve literally biked more this year than the last few years combined. As of today, that means over 850 miles for the year. This includes a 55+ mile ride this past weekend. I’ve really been enjoying it. I’ve been a bicyclist my entire life but have missed riding this much. I’m hoping next year to upgrade my road bike (open to suggestions) to replace my 30 year old Trek 520.

20200809_134643

You mean I still have to bike to the top of THAT? 20 miles in and about 7 miles to go, but that’s where all the altitude gain is!

20200809_142741

“It looks like we made it!” – after the climb

And I’m still hoping to complete my first century ride in over 3 decades (that makes me feel old!)

Speaking

On one hand I’ve done far less speaking for SQL events than most years. I believe SQL Saturday Albany was my first SQL event this year. But I’ve been asked to present virtually at 3 NSS Grotto meetings on “So this is your first rescue.” I’ve also been selected to speak at PASS Summit for the first time, so even though it won’t be in person, I’m excited! I’ve also volunteered to speak at the Denver User Group meeting on September 17th.

Webinars

I’ve taken advantage of the fact that so much is now virtual and attended some Red-Gate Live webinars, a few SQL WIT webinars and others. One piece of advice I’ll give here, if you can, attend what you can. You no longer have to be physically present for most SQL User Group meetings, I know several #SQLFamily members who have attended 2-3 User Group presentations in the same week! It’s one advantage of everything going virtual!

Virtual Get-togethers

I, and in some cases the rest of my family, have Zoomed with my mom, my aunt and others and almost weekly, with members of my #SQLFamily. It’s been a great uplift to see so many folks.

What’s Next?

I’ll admit, it’s been a different year. We had to postpone our NCRC National Weeklong Training Seminar until 2021. But, I just got approval to host a Modular Level 1 this fall. I’m still not sure we can pull it off, but if we can, it’ll be great.

I’ve really missed seeing a lot of folks in person.

Covid still looms large in my planning of travel and events. I don’t know what the next 6-9 months will bring, but I know I’ll try to make the best of it!

What about you?

What’s something positive you’ve been able to accomplish during this pandemic? I want to hear it!

And Remember

20200809_161758

We could all use a little support!

Query Store Saves the Day

It’s never a good thing when you get an impromptu meeting invite on the weekend and the subject line is “Sync Error”. I honestly didn’t even see the invite until the meeting had been going on for over an hour.

I called in and was brought up to speed. A 3rd party tool one of my client uses was having major timeout issues. Normally it’s fine, but my client was taking advantage of the weekend to do a very large import of data and the tool wasn’t keeping up.

I both love and hate being thrown into situations like this. I hate it because often I have very little information to go on, but also love it, because it can be a good challenge. So, I wanted to collect some data. Fortunately the database in question runs on SQL Server 2016. This blog post covers a bit of what we did and ends with why I am so grateful for Query Store.

Query Store and the First Graphs

I quickly enabled Query Store and grabbed a quick report. Based on help with the 3rd party support, I was able to focus on a particular query.

Query Store first graph

Initial Query Store screen grab

So, right away, I knew that at times this query could flip to a pretty bad query plan. I was curious as to why. But while poking around, I noticed something else going on. The database was at the SQL Server 2008 compatibility level, despite running on SQL Server 2016. Now I know when we upgraded the server a year ago the 3rd party vendor didn’t guarantee compatibility with 2016, so we had left it in its old compatibility level. Since then apparently the vendor had qualified it and I confirmed with their support who was on the line that I could change the compatibility level to SQL Server 2016. Of course, I wanted to see if this would make a difference, so I grabbed another one of the problematic queries and looked at the query plan both before and after.

Compatibility level 100

Query Plan at SQL Server 2008 Level

Compatibility level 130

Query Plan at SQL Server 2016 Level

As you’ll note, the 2008 plan uses 2 hash matches, the 2016 uses two merge joins. That’s interesting by itself, but after collecting a bit of data, I saw the 2016 plan was running in an average of 45ms. The 2008 plan had been averaging 1434ms. That’s quite the improvement, simply by a single change!

That said, I still wasn’t entirely comfortable with what was going on and dug a bit deeper.

Digging Deeper

The change to the compatibility level had essentially eliminated the green bar in the above graph. This was good. But the blue bar to the left of it was still an issue. It also had a similar issue with flipping between two different query plans, but this was even worse.

Query Store second graph

Better, but not that one query really stands out!

I find this particular chart to be the most useful. I set a custom time frame (in this case 3 hours) and looked at the total duration of 25 queries that had accumulated the most time running. It’s pretty clear that one query dominates and working on this is probably where I want to spend my efforts. It’s also very hard to pick out, but the query (#12) from the first graph that I had looked at, has improved so much that it’s now moved to 12th on the list from the 2nd position.  That’s quite an improvement and simply by changing the compatibility level! More on my thoughts on that below.

The more I thought about it, the more I started to focus on statistics. This was an educated guess based on the fact that my client was doing a LOT of inserts and updates into a particular table. There’s another issue I’ll also discuss. This one I couldn’t fix unfortunately, but if the 3rd party can, I think they’ll see a HUGE improvement in performance.

Slow query plan

Slow version of the query

Fast version of the query

Fast version of the query

These look VERY similar, except the position of the Key lookups and the Index Seeks are swapped. That may not seem like much but the slow version was on average taking about 93.95 ms and the fast version was on average taking about .11ms. That’s a HUGE difference, about 850x difference! It took me a bit to realize what was going on, but let’s talk about that Key lookup. Even with the faster version, it’s obvious that if I can eliminate that, I could get things to be even faster!  The problem is that the query wants to return some columns not covered in the IX_FileID index. That’s generally easy to fix and while I’m loathe to make updates to 3rd party packages, I was willing to test this one out by making it a covering index. Unfortunately, this is where I was stymied. One of the columns is an IMAGE datatype and you can’t throw those into an index. I’ve recommended to the 3rd party vendor they try to change this. It wouldn’t be easy, but it could have dramatic performance improvements here and elsewhere (I had run into this problem last year while trying to tackle another performance issue).

I should note, that even though this query is actually very fast, it is executed so much that its total time dominates in the system. This is one reason why any improvement here would have a dramatic impact.

Statistics

In any case, looking at these two query plans and doing some further testing confirmed my hypothesis and also suggests why changing the compatibility level helped so much: statistics were very quickly getting out of whack.

I was able to confirm this by grabbing some data from the query store for just the last hour and it showed only the slow version of the query was running. I then forced an update of stats on the table in question and immediately saw the query flip over to the faster plan. This continued for awhile before it flipped back to the slower version.

We developed a plan, which I’ll admit upfront didn’t work. We decided that updating the stats on that particular table every hour might give us tremendous performance gains. And in fact it did initially. BUT, what we found was that after an hour of inserts, running the update stats for that table took about 45-60 seconds and the vendors tool has a hard-coded 30 second timeout. And because of the way this particular tool works, it means after a failure you have to start from scratch on every run. Since the job can take 4-6 hours to run, we couldn’t update stats every hour, even though I would have liked to.

Query Store third graph

The graph that should our plan wasn’t working

Above shows how at the time the update stats was running (that particular column of the query story graphic is cut off) the query times jumped to 30 seconds.  So while overall updating the stats is a good thing, here it was definitely killing our process.

Above I mentioned that changing the compatibility level still had an impact here. What I didn’t show here was that I was also looking at a bunch of statistics histograms and could see how badly things had gotten in some cases. But this is an area where SQL Server 2016 makes a difference! It can do more in the background better to help keep statistics a bit more accurate (still not as good as a full update, but it can dramatically help.) This is a hug part I believe of why the first query addressed above improved AND stayed improved.

Loving Query Store

They say a picture is worth a 1000 words. Honestly, I probably could have figured out the above issues with running a bunch of queries, looking at some DMVs, statistics histograms and the like. But it would have taken longer. Note too you can query the query store. But, the ability to instantly look at a graph, see what’s taking the most time, or executing the most, or a variety of other parameters makes the graphical interface to Query store EXTREMELY valuable. I was able to instantly zero in on a couple of key queries and focus my energies there. By varying the timeframes I was looking at, I could try changes and see the impact within minutes. I could also look at the stored query plans and make judgments based on what they showed.

If you’re NOT using Query Store to debug performance issues, start doing it. To be honest, I haven’t used it much. I wouldn’t call myself an expert in it by any means. But, I was able to pull it up and almost instantly have insight into my client’s issues and was able to make actionable suggestions.

And to quote the product manager there after I fixed the first query simply by changing compatibility mode, “A good DBA is like having a good mechanic to work on your car.” That one made me smile.

Oh and I’ve been known to swap out the alternator on my old Subaru in under 10 minutes and have replaced the brakes a number of times. So if this DBA thing doesn’t work out, I guess I’ve got another career I can look into!

Final Note

Per my NDA, I obviously haven’t named my client. But also, simply out of respect, I haven’t named the third party tool. I don’t want folks thinking I’m trying to besmirch their name. Their product is a fine one and I’d recommend it if asked. But my client is one of their larger users and sometimes pushes it to the limits so we sometimes find some of the edge cases. So nothing here is meant to disparage the 3rd party tool in an way (though they should replace that image field since it really doesn’t need to be one!)