Query Store Saves the Day

It’s never a good thing when you get an impromptu meeting invite on the weekend and the subject line is “Sync Error”. I honestly didn’t even see the invite until the meeting had been going on for over an hour.

I called in and was brought up to speed. A 3rd party tool one of my client uses was having major timeout issues. Normally it’s fine, but my client was taking advantage of the weekend to do a very large import of data and the tool wasn’t keeping up.

I both love and hate being thrown into situations like this. I hate it because often I have very little information to go on, but also love it, because it can be a good challenge. So, I wanted to collect some data. Fortunately the database in question runs on SQL Server 2016. This blog post covers a bit of what we did and ends with why I am so grateful for Query Store.

Query Store and the First Graphs

I quickly enabled Query Store and grabbed a quick report. Based on help with the 3rd party support, I was able to focus on a particular query.

Query Store first graph

Initial Query Store screen grab

So, right away, I knew that at times this query could flip to a pretty bad query plan. I was curious as to why. But while poking around, I noticed something else going on. The database was at the SQL Server 2008 compatibility level, despite running on SQL Server 2016. Now I know when we upgraded the server a year ago the 3rd party vendor didn’t guarantee compatibility with 2016, so we had left it in its old compatibility level. Since then apparently the vendor had qualified it and I confirmed with their support who was on the line that I could change the compatibility level to SQL Server 2016. Of course, I wanted to see if this would make a difference, so I grabbed another one of the problematic queries and looked at the query plan both before and after.

Compatibility level 100

Query Plan at SQL Server 2008 Level

Compatibility level 130

Query Plan at SQL Server 2016 Level

As you’ll note, the 2008 plan uses 2 hash matches, the 2016 uses two merge joins. That’s interesting by itself, but after collecting a bit of data, I saw the 2016 plan was running in an average of 45ms. The 2008 plan had been averaging 1434ms. That’s quite the improvement, simply by a single change!

That said, I still wasn’t entirely comfortable with what was going on and dug a bit deeper.

Digging Deeper

The change to the compatibility level had essentially eliminated the green bar in the above graph. This was good. But the blue bar to the left of it was still an issue. It also had a similar issue with flipping between two different query plans, but this was even worse.

Query Store second graph

Better, but not that one query really stands out!

I find this particular chart to be the most useful. I set a custom time frame (in this case 3 hours) and looked at the total duration of 25 queries that had accumulated the most time running. It’s pretty clear that one query dominates and working on this is probably where I want to spend my efforts. It’s also very hard to pick out, but the query (#12) from the first graph that I had looked at, has improved so much that it’s now moved to 12th on the list from the 2nd position.  That’s quite an improvement and simply by changing the compatibility level! More on my thoughts on that below.

The more I thought about it, the more I started to focus on statistics. This was an educated guess based on the fact that my client was doing a LOT of inserts and updates into a particular table. There’s another issue I’ll also discuss. This one I couldn’t fix unfortunately, but if the 3rd party can, I think they’ll see a HUGE improvement in performance.

Slow query plan

Slow version of the query

Fast version of the query

Fast version of the query

These look VERY similar, except the position of the Key lookups and the Index Seeks are swapped. That may not seem like much but the slow version was on average taking about 93.95 ms and the fast version was on average taking about .11ms. That’s a HUGE difference, about 850x difference! It took me a bit to realize what was going on, but let’s talk about that Key lookup. Even with the faster version, it’s obvious that if I can eliminate that, I could get things to be even faster!  The problem is that the query wants to return some columns not covered in the IX_FileID index. That’s generally easy to fix and while I’m loathe to make updates to 3rd party packages, I was willing to test this one out by making it a covering index. Unfortunately, this is where I was stymied. One of the columns is an IMAGE datatype and you can’t throw those into an index. I’ve recommended to the 3rd party vendor they try to change this. It wouldn’t be easy, but it could have dramatic performance improvements here and elsewhere (I had run into this problem last year while trying to tackle another performance issue).

I should note, that even though this query is actually very fast, it is executed so much that its total time dominates in the system. This is one reason why any improvement here would have a dramatic impact.

Statistics

In any case, looking at these two query plans and doing some further testing confirmed my hypothesis and also suggests why changing the compatibility level helped so much: statistics were very quickly getting out of whack.

I was able to confirm this by grabbing some data from the query store for just the last hour and it showed only the slow version of the query was running. I then forced an update of stats on the table in question and immediately saw the query flip over to the faster plan. This continued for awhile before it flipped back to the slower version.

We developed a plan, which I’ll admit upfront didn’t work. We decided that updating the stats on that particular table every hour might give us tremendous performance gains. And in fact it did initially. BUT, what we found was that after an hour of inserts, running the update stats for that table took about 45-60 seconds and the vendors tool has a hard-coded 30 second timeout. And because of the way this particular tool works, it means after a failure you have to start from scratch on every run. Since the job can take 4-6 hours to run, we couldn’t update stats every hour, even though I would have liked to.

Query Store third graph

The graph that should our plan wasn’t working

Above shows how at the time the update stats was running (that particular column of the query story graphic is cut off) the query times jumped to 30 seconds.  So while overall updating the stats is a good thing, here it was definitely killing our process.

Above I mentioned that changing the compatibility level still had an impact here. What I didn’t show here was that I was also looking at a bunch of statistics histograms and could see how badly things had gotten in some cases. But this is an area where SQL Server 2016 makes a difference! It can do more in the background better to help keep statistics a bit more accurate (still not as good as a full update, but it can dramatically help.) This is a hug part I believe of why the first query addressed above improved AND stayed improved.

Loving Query Store

They say a picture is worth a 1000 words. Honestly, I probably could have figured out the above issues with running a bunch of queries, looking at some DMVs, statistics histograms and the like. But it would have taken longer. Note too you can query the query store. But, the ability to instantly look at a graph, see what’s taking the most time, or executing the most, or a variety of other parameters makes the graphical interface to Query store EXTREMELY valuable. I was able to instantly zero in on a couple of key queries and focus my energies there. By varying the timeframes I was looking at, I could try changes and see the impact within minutes. I could also look at the stored query plans and make judgments based on what they showed.

If you’re NOT using Query Store to debug performance issues, start doing it. To be honest, I haven’t used it much. I wouldn’t call myself an expert in it by any means. But, I was able to pull it up and almost instantly have insight into my client’s issues and was able to make actionable suggestions.

And to quote the product manager there after I fixed the first query simply by changing compatibility mode, “A good DBA is like having a good mechanic to work on your car.” That one made me smile.

Oh and I’ve been known to swap out the alternator on my old Subaru in under 10 minutes and have replaced the brakes a number of times. So if this DBA thing doesn’t work out, I guess I’ve got another career I can look into!

Final Note

Per my NDA, I obviously haven’t named my client. But also, simply out of respect, I haven’t named the third party tool. I don’t want folks thinking I’m trying to besmirch their name. Their product is a fine one and I’d recommend it if asked. But my client is one of their larger users and sometimes pushes it to the limits so we sometimes find some of the edge cases. So nothing here is meant to disparage the 3rd party tool in an way (though they should replace that image field since it really doesn’t need to be one!)

 

 

SQL Saturday Albany 2020

So, another SQL Saturday Albany is in the books. First, I want to thank Ed Pollack and his crew for doing a great job with a changing and challenging landscape.  While I handle the day to day and monthly operations of the Capital Area SQL Server User Group, Ed handles the planning and operations of the SQL Saturday event. While the event itself is only 1 day of the year, I suspect he has the harder job!

This year of course planning was complicated by the fact that the event had to become a virtual event. However, it’s a bit ironic we went virtual because in many ways, the Capital District of NY is probably one of the safer places in the country to have an in-person event. That said, virtual was still by far the right decision.

Lessons Learned

Since more and more SQL Saturdays will be virtual for the foreseeable future, I wanted to take the opportunity to pass on some lessons I learned and some thoughts I have about making them even more successful. Just like the #SQLFamily in general passing on knowledge about SQL Server, I wanted to pass on knowledge learned here.

For Presenters

The topic I presented on was So you want to Present: Tips and Tricks of the Trade. I think it’s important to nurture the next generation of speakers. Over the years I was given a great deal of encouragement and advice from the speakers who came before me and I feel it’s important to pass that on. Normally I give this presentation in person. One of the pieces of advice I really stress in it is to practice beforehand. I take that to heart. I knew going into this SQL Saturday that presenting this remotely would create new challenges. For example, on one slide I talk about moving around on the stage. That doesn’t really apply to virtual presentations. On the other hand, when presenting them in person, I generally don’t have to worry about a “green-screen”. (Turns out for this one I didn’t either, more on that in a moment.)

So I decided to make sure I did a remote run through of this presentation with a friend of mine. I can’t tell you how valuable that was. I found that slides I thought were fine when I practiced by myself didn’t work well when presented remotely. I found that the lack of feedback inhibited me at points (I actually do mention this in the original slide deck). With her feedback, I altered about a 1/2 dozen slides and ended up adding 3-4 more. I think this made for a much better and more cohesive presentation.

Tip #1: Practice your virtual presentation at least once with a remote audience

They don’t have to know the topic or honestly, even have an interest in it. In fact I’d argue it might help if they don’t, this means they can focus more on the delivery and any technical issues than the content itself. Even if you’ve given the talk 100 times in front of a live audience, doing it remotely is different enough that you need feedback.

Tip #2: Know your presentation tool

This one actually came back to bite me and I’m going to have another tip on this later. I did my practice run via Zoom, because that’s what I normally use. I’m used to the built-in Chroma Key (aka green-screen) feature and know how to turn it on and off and to play with it. It turns out that GotoWebinar handles it differently and I didn’t even think about it until I got to that part of my presentation and realized I had never turned it on, and had no idea how to! This meant that this part of my talk didn’t go as well as planned.

Tip #3: Have a friend watch the actual presentation

I actually lucked out here, both my kids got up early (well for them, considering it was a weekend) and watched me present. I’m actually glad I didn’t realize this until the very end or else I might have been more self-conscious. That said, even though I had followed Tip #1 above, they were able to give me more feedback. For example, (and this relates to Tip #2), the demo I did using Prezi was choppy and not great. In addition, my Magnify Screen example that apparently worked in Zoom, did not work in GotoWebinar! This feedback was useful. But even more so, if someone you know and trust is watching in real-time, they can give real-time feedback such as issues with bandwidth, volume levels, etc.

Tip #4: Revise your presentation

Unless your presentation was developed exclusively to be done remotely, I can guarantee that it probably need some changes to make it work better remotely. For example, since most folks will be watching from their computer or phone, you actually may NOT need to magnify the screen such as you would in a live presentation with folks sitting in the back of the room. During another speaker’s presentation, I realized they could have dialed back the magnification they had enabled in SSMS and it would have still been very readable and also presented more information.

You also can’t effectively use a laser pointer to highlight items on the slide-deck.

You might need to add a few slides to better explain a point, or even remove some since they’re no longer relevant. But in general, you can’t just shift and lift a live presentation to become a remote one and have it be as good.

Tip #5: Know your physical setup

This is actually a problem I see at times with in-person presentations, but it’s even more true with virtual ones and it ties to Tip #2 above. If you have multiple screens, understand which one will be shown by the presentation tool. Most, if not all, let you select which screen or even which window is being shared. This can be very important. If you choose to share a particular program window (say PowerPoint) and then try to switch to another window (say SSMS) your audience may not see the new window. Or, and this is very common, if you run PowerPoint in presenter mode where you have the presented slides on one screen, and your thumbnails and notes on another, make sure you know which screen is being shared. I did get this right with GotoWebinar (in part because I knew to look for it) but it wasn’t obvious at first how to do this.

In addition, decide where to put your webcam! If you’re sharing your face (and I’m a fan of it, I think it makes it easier for others to connect to you as a presenter) understand which screen you’ll be looking at the most, otherwise your audience may get an awkward looking view of you always looking off to another screen. And, if you can, try to make “eye contact” through the camera from time to time. In addition, be aware, and this is an issue I’m still trying to address, that you may have glare coming off of your glasses. For example, I need to wear reading glasses at my computer, and even after adjusting the lighting in the room, it became apparent, that the brightness of my screens alone was causing a glare problem. I’ll be working on this!

Also be aware of what may be in the background of your camera. You don’t want to have any embarrassing items showing up on your webcam!

For Organizers

Tip #6: Provide access to the presentation tool a week beforehand

Now, this is partly on me. I didn’t think to ask Ed if I could log into one of the GotoWebinar channels beforehand, I should have. But I’ll go a step further. A lesson I think we learned is that as an organizer, make sure presenters can log in before the big day and that they can practice with the tool. This allows them to learn all the controls before they go live. For example, I didn’t realize until 10 minutes was left in my presentation how to see who the attendees were. At first I could only see folks who had been designated as a panelist or moderator, so I was annoyed I couldn’t see who was simply attending. Finally I realized what I thought was simply a label was in fact a tab I could click on. Had I played with the actual tool earlier in the week I’d have known this far sooner.  So organizers, if you can, arrange time for presenters to log in days before the event.

Tip #7: Have plenty of “Operators”

Every tool may call them by different names but ensure that you have enough folks in each “room” or “channel” who can do things like mute/unmute people, who can ensure the presenter can be heard, etc. When I started my presentation, there was some hitch and there was no one around initially to unmute me. While I considered doing my presentation via interpretive dance or via mime, I decided to not to. Ed was able to jump in and solve the problem. I ended up losing about 10 minutes of time due to this glitch.

Tip #8: Train your “Operators”

This goes back to the two previous tips, make sure your operators have training before the big day. Setup an hour a week before and have them all log in and practice how to unmute or mute presenters, how to pass control to the next operator, etc. Also, you may want to give them a script to read at the start and end of each session. “Good morning. Thank you for signing in. The presenter for this session will be John Doe and he will be talking about parameter sniffing in SQL Server. If you have a question, please enter it in the Q&A window and I will make sure the presenter is aware of it. This session is/isn’t being recorded.” At the end a closing item like, “Thank you for attending. Please remember to join us in Room #1 at 4:45 for the raffle and also when this session ends, there will be a quick feedback survey. Please take the time to fill it out.”

Tip #9: If you can, have a feedback mechanism

While people often don’t fill out the written feedback forms at a SQL Saturday, when they do, they can often be valuable. Try to recreate this for virtual ones.

Tip #10Have a speaker’s channel

I hadn’t given this much thought until I was talking to a fellow speaker, Rie Irish later, and remarked how I missed the interaction with my fellow speakers. She was the one who suggested a speaker’s “channel” or “room” would be a good idea and I have to agree. A private room where speakers can log in, chat with each other, reach out to operators or organizers strikes me as a great idea. I’d highly suggest it.

Tip #11: Have a general open channel

Call this the “hallway” channel if you want, but try to recreate the hallway experience where folks can simply chat with each other. SQL Saturday is very much a social event, so try to leverage that! Let everyone chat together just like they would at an in-person SQL Saturday event.

For Attendees

Tip #12: Use social media

As a speaker or organizer, I love to see folks talking about my talk or event on Twitter and Facebook. Please, share the enthusiasm. Let others know what you’re doing and share your thoughts! This is actually a tip for everyone, but there’s far more attendees than organizers/speakers, so you can do the most!

Tip #13: Ask questions, provide feedback

Every platform used for remote presentations offers some sort of Q&A or feedback. Please, use this. As a virtual speaker, it’s impossible to know if my points are coming across. I want/welcome questions and feedback, both during and after. As great as my talks are, or at least I think they are, it’s impossible to tell without feedback if they’re making an impact. That said, let me apologize right now, if during my talk you tried to ask a question or give feedback, because of my lack of familiarity with the tool and not having the planned operator in the room, I may have missed it.

Tip #14: Attend!

Yes, this sounds obvious, but hey, without you, we’re just talking into a microphone! Just because we can’t be together in person doesn’t mean we should stop learning! Take advantage of this time to attend as many virtual events as you can! With so many being virtual, you can pick ones out of your timezone for example to better fit your schedule, or in different parts of the world! Being physically close is no longer a requirement!

In Closing

Again, I want to reiterate that Ed and his team did a bang-up job with our SQL Saturday and I had a blast and everyone I spoke to had a great time. But of course, doing events virtually is still a new thing and we’re learning. So this is an opportunity to take the lessons from a great event and make yours even better!

I had a really positive experience presenting virtually and look forward to my PASS Summit presentation and an encouraged to put in for more virtual SQL Saturdays after this.

In addition, I’d love to hear what tips you might add.

A Summit To Remember

There’s been a lot of talk about the 2020 PASS Summit and how the impact of making it virtual this year. I’ve even previously written about it. I’ll be clear, I would prefer an in-person summit. But that said, I think having it virtual does provide for some fascinating and interesting possibilities and I look forward to seeing how they’re handled.  It will certainly be different being able to watch a session at a later time as a default option. And my understanding is that session schedules will no longer be constrained by the timezone the Summit is being held in.

That said, I also have to admit a certain bias here. I’ve wanted to speak at Summit for a couple of years now and have been turned down twice in the past two years. This year I was hoping again to speak, but alas, I procrastinated a bit too long and literally missed the original window to submit by a few hours.

Note I said original window. Because the Summit was moved to a virtual Summit the decision was made to re-open the call for speakers. This time I took advantage of that 2nd chance and submitted a bid.

And I’m so glad I did. Because if you didn’t have a reason to attend summit before, you do now! You get to hear me talk about PowerShell! So, I’ll admit to getting an unexpected benefit out of the move to a Virtual Summit.

I still recall one of my first attempts to use PowerShell at a client site, about 8 years ago. It did not go well. The security policy wouldn’t let me do what I wanted and the available knowledge on the Internet was sparse. Basically I wanted to loop through a list of servers and see if they had SQL Server installed. I eventually gave up on that project.

Since then though, I’ve been drawn to PowerShell and have come to love it. Now, when you hear a DBA talk about PowerShell, they will almost always mention dbatools. I want to go on record right now, I think it’s a GREAT addition, but I rarely use it. Not because there’s anything wrong with it, but mostly because my current usage is a bit different than what it provides. I do talk about it a bit here though.

For the talk I’ll be presenting, my plan is to start with a real simple PowerShell Script and slowly build on it until it’s a useful script for deploying SQL Scripts to multiple servers. For anyone who has read my articles at Red-Gate, much of this will be familiar territory, but I hope to cover in 75 minutes what I cover in 3-4 articles.

Checking this morning, I noticed that I’m among good company, and it’s humbling to see it, when it comes to speaking about PowerShell.

So, I hope you “come” and see me present on PowerShell at SQL Summit 2020. I’ll be in New York, where will you be?

“We want information…information… information!!!”

For anyone who has ever watched the classic British mini-series “The Prisoner” this is a very recognizable line. But it applies to many parts of our lives.

This is a tale of hiking, a non-cave rescue, and yes, eventually Extended Events.

“I went to the woods…”

This past weekend I spent some time in the woods hiking and getting away from it all. This is the first time in literally decades I had done an overnight hike on the Appalachian Trail. My goal was to get in an overnight and work on closing a gap of it that I had not yet hiked.

The last time I hiked the trail, cell phones were a very rare item, carried by business people only and often weighing several pounds, they certainly weren’t something the average hiker could afford, and even if they could, they would be too heavy to carry.

I mention this because I had fully intended to carry mine with me, so that I could take pictures, and perhaps even, I’ll admit it, if I had connectivity when I camped that night, catch up on some Wikipedia reading, or send a picture or two to friends and family. But alas, about 2 miles into the hike, at a gorgeous viewpoint (see older photo above), I stopped, tried to pull out my phone and realized that unsettled feeling I had at my car before locking it wasn’t “Am I sure I have my keys” but really should have been “am I sure I have my phone!”

It turns out, other than my inability to document my trip with some photos, and not being able to call my wife to let her know I’d be at the pick-up point much earlier than we had planned, not having access to information of the outside world was a refreshing change of pace. I’m almost glad I didn’t have my phone.

A Missed Call

As some of my readers know, besides being a DBA, I also teach and at times perform cave rescues. As I tell folks once they get past the “That’s so cool” phase, it’s not really all that glamorous. If I get called out to one actual rescue a year here in the Northeast, it’s a busy year. But, on warm weekends in the summer, the odds are higher than say the middle of the week in the winter (though that has happened too).

So a concern I had in regards to not having my phone was that I would miss a call for a potential rescue.

It turns out I was partially correct in my concern.  On the way home, I saw my phone buzz. I didn’t answer it, but a few minutes later did glance down to see “Missed Call”. It was from my team co-captain. (To be transparent here, the terms team and co-captain are used loosely, it’s not a very formal setup). She rarely calls, especially if it’s a weekend, except in an emergency. I waited until I got home to call her back. And it wasn’t an actual call-out, yet. It was at this point a “potential missing caver.” What this meant in this case was a vehicle had been spotted outside a popular cave, and it had been there for at least 18 hours. That is unusual for this cave, most trips are 2-3 hours in length. So, this was concerning. But, we didn’t have enough information. Was someone in the cave? If so, where? Were they in need of assistance? We needed information, and by hook or by crook we were going to get it. Or at least some of it.

In general, one of the biggest issues we have when starting a cave rescue is the lack of information. In this case case it was even, “are they in the cave?” Had we determined they most likely were, the next question would have been, “where?”. That shapes our search. “How long?” That might shape what equipment we bring on our initial search. “What injuries?” That would also shape our response. In any cave rescue we eventually get the information, but it can be frustrating to have to wait. Caves don’t have cell service inside. (We often do literally put our own phone system into caves during a rescue however!) When we train folks, they often find it hard to believe at first that a patient could be 300 feet into a cave, and it would take a skilled, fresh caver 45 minutes to simply get to them, and another 45 minutes to get back. So as simple a request as “can you get me information about the patient” could easily take 90 minutes or more. And yes, that’s a real life incident.

In this case, eventually the authorities ran the plates and it appears the plates had expired before 1990, the VIN that could be found on the insurance card sitting on the dashboard was made up (or belonged to a vehicle decades older) and the address on the card was fake. We stood down. There wasn’t going to be a search that day. It was entirely a police matter.

#TeamExEvents

I said I’d get to Extended Events and here we are.  I’ve written about them before and I’m a huge fan of them. Simply put, if you’re not using them, you’re probably missing information that you can very useful. I started in the days of SQL Server 4.21a, but really started to cut my teeth on SQL Server with 6.5 came out. Back then our problem sets were probably easier and smaller, but we still dealt with similar issues, the biggest has often been performance related. In the early days there were some decent tricks and ways of diagnosing where your performance bottlenecks were, but to be honest, sometimes it was hit or mess. Over the years, Microsoft has added a lot of functionality to SQL Server including DMVs and Extended Events. I now routinely use Extended Events to track down performance issues or other problems. Last night at our local User Group Meeting, Grant Fritchey did a lightning round where he highlighted one of the features of Extended Events that honestly, I know about, but don’t use enough: Causality Tracking

Causality Tracking Checked

Causality Tracking extends the power of Extended Events to a new Level!

Let’s just say this is a feature that makes picking out the specific events you want to follow much easier. The example Grant gave showed a ton of detail, more than you’d normally need, but extremely useful if you did in fact need it. In other words a simple checkbox can now give us a great deal of useful information.

With the right information, you can often identify bottlenecks and make huge performance gains.

At times I feel like I’m Number Six, trying to get information about a database problem or a potential cave rescue

Number Six: Where am I?

Number Two: In the village.

Six: What do you want?

Two: Information.

Six: Whose side are you on?

Two: That would be telling. We want information…information… information!!!

Six: You won’t get it!

Two: By hook or by crook, we will.

In conclusion, there are times when disconnecting from the information around us can make a weekend in the woods more enjoyable, but a dearth of it is standard at the start of a cave rescue, while having ready access to it can make solving a problem far easier.

Where do you stand on the information spectrum today? Do you have a lack of it, the right amount, or too much?

 

Trust but Verify

This is one of those posts where you’ll just have to trust me. Honestly.

I want to talk about indexes.

About a week ago, a friend on a chat system I use mentioned how one of their colleagues had mentioned, “oh, we don’t have to optimize the database, the server is fast enough” or words to those effect. All of us in the discussion blanched a bit. Yes, when I started in the business a 10GB database was considered large and because of the memory limit with 32-bit SQL, we were limited to 2GB (or 3GB if you took the right steps) of memory so it was literally impossible to keep a large database in memory. Of course now we routinely deal with databases 100s of GB in size with machines that can easily have .5TB of memory or more. This means except for writes, an entire database can easily be kept in memory.

But that said, optimization still matters. Last week I was debugging an ETL process that I’ve helped a client with. I’d love to show screen shots, but my NDA won’t allow me (hence my asking you to trust me). Ok, that’s partly a lie. I couldn’t provide too many details if I wanted to, but the bigger issue is, I’ve since closed the windows I that showed the scripts in questions and the results of my changes.

One of the last things each step in the ETL does is write back to the source table an updated Sales Force id. It’s actually a bit more complicated because what it really does is write to either a Success table or an Error table and depending on a factor or two, a trigger will then update the source table. I had previously debugged and improved the performance of the trigger. But something was still bothering me about the performance. I looked a bit deeper and one of the things that trigger does if there’s a success is make sure to remove the row from the Error table. This was taking longer than I suspected it should, so I dug into it and I noticed that the Error table had no index.  

I can’t show the original queries I used, but I can show an example of the impact of adding a simple clustered index. (See, you can’t even trust me to say I won’t show any examples! You’d better read the entire post to verify what I’m really writing!)

Here’s an example query (with some changes to hide client specific data)

select * from ErrorTable where SF__External_Id__c='005A000022IouWqIAX'

It’s a very simple query (and simpler than the actual one I was dealing with) but is enough to show the value of a proper index.

Now, in my original query, the Query Tuning Advisor actually suggested an index on SF__External_ID__c. In the example above it didn’t. There’s a canard among many DBAs that the QTA is generally useless and often it is, though I think it’s gotten better. As a consultant, I can often come into a new client and can tell when someone has gone crazy with the QTA and adopted EVERY SINGLE suggestion. In other words, they trusted it, but they never verified it. Why is this a problem? Well at times the QTA can be overly aggressive in my experience, suggesting indices that really provide little benefit, or if you add an index in response to a select query that is run say once a day, but where there are 1000s of updates a day, you might actually slow down your updates (since now the update also has to update the index). And as mentioned above, sometimes it might fail to suggest an index. (I think in this case, it didn’t suggest one on my example because the size of the underlying table was far smaller than before).

So, I like to verify that the index I’ll add will make a difference. In cases like this, I often go old school and simply bracket my test queries

set statistics IO ON
set statistics Time ON
select * from ErrorTable where SF__External_Id__c='005A000022IouWqIAX'
set statistics IO OFF
set statistics Time OFF

And then I enable Actual Execution Plan.

The results I received without any sort of index are below. Some key numbers are highlighted in red.

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 47 ms, elapsed time = 63 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(2 rows affected)
Table 'ErrorTable'. Scan count 1, logical reads 3570, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 15 ms.
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

You’ll notice the physical reads are 0. This is nice. This means everything is in memory.

In this case, because I’m familiar with how the ErrorTable is accessed I decided a clustered index on SF__External_Id__c would be ideal. (all my updates, inserts, deletes, and selects use that to access this table).

I added the index and my reran the query:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(2 rows affected)
Table 'ErrorTable'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Note the number of logical reads dropped by about a factor of 1000. My elapsed time dropped from 15 ms to 0 ms (or rather less than .05 ms so SQL Server rounded down).

If we look at the graphical query plan results we something similar:

First, without the index:

Trust_but_Verify_Query Table Scan

Table scan to find 2 rows

Trust_but_Verify_Query Table Seek

Table Seek to find 2 rows

That’s nice, I now know I’m doing a seek rather than a scan, but is that enough? I mean if the ErrorTable only has 2 rows, a seek is exactly the same as a scan!

So let’s dig deeper:

Trust_but_Verify_Query Table Scan Details

Query plan showing details for a scan

Trust_but_Verify_Query Table Seek Details

Query plan showing details for a seek

Here you can definitely see the dramatic improvement. Instead of reading in over 100,00 rows (at a bit over 2.5 KB per row, or over 270MB) we only need to read in 2 rows, for a total of just over 5 KB of data.

Now wonder it’s faster. In fact, in the ETL process where it was originally taking about 1 minute to process 1000 rows, my query with the index was now executing 3000 rows in under 10 seconds.

The above is a bit of a contrived example, but it’s based on actual performance tuning I did last week. And this isn’t meant to be a lesson in actual performance tuning, but more to show that if you make a chance (in this case adding an index) you can’t just trust it will work, but you should VERIFY that it has made a difference, and more importantly, that it makes a difference for your workload. I’ve seen GTA often make valid, but useless index suggestions because someone ran an uncommonly used query against it and assumed the recommendation was good. Or, they’ve made assumptions about the size of the table.

So never just trust an index will help, but actually VERIFY it will help.

 

Giving Blood and Pride Month

I gave blood yesterday. It got me thinking. First, let me show a few screenshots:

male blood donor shot 1

7 Male Donor #1 screen shot

female blood donor shot 1

Female Donor #1 screen shot

Let me interject here I’m using the terms Male and Female based on the criteria I selected in the American Red Cross’s Fast Pass screen. More on why I make that distinction further on. But first two more screen shots.

female blood donor shot 2

Pregnancy question highlighted for female

male blood donor shot 2

No pregnancy question for males

Now, on the face of it, this second set of questions especially almost seems to make sense: I mean if I answered Male early on in the questionnaire, why by asked about a pregnancy? But what I’m asked at the beginning is about my gender, not my actual child-bearing capability. Let me quote from Merriam-Webster:

2-b: the behavioral, cultural, or psychological traits typically associated with one sex

Or from the World Health Organization:

Gender refers to the roles, behaviours, activities, attributes and opportunities that any society considers appropriate for girls and boys, and women and men. Gender interacts with, but is different from, the binary categories of biological sex.

Who can be pregnant?

So above, really what the Red Cross is asking isn’t about my gender, but really my ability to be pregnant. Now, this is a valid medical concern. There are risks they want to avoid in regards to pregnant women, or recently pregnant women giving blood. So their ultimate goal isn’t the problem, but their initial assumption might be. A trans-man might still be able to get pregnant, and a trans-woman might be incapable of getting pregnant (as well as a cis-woman might be incapable.) And this is why I had the caveat above about using the terms male and female. I’m using the terms provided which may not be the most accurate.

Assumptions on risk factors

The first set of images is a problematic in another way: it is making assumptions about risk factors. Now, I think we can all agree that keeping blood borne pathogens such as HIV out of the blood supply is a good one. And yes, while donated blood is tested, it can be even safer if people who know they are HIV or at risk for it can potentially self-select themselves out of the donation process.

But…

Let me show the actual question:

Male Male 3 month contact question

Question 21, for Men

This is an improvement over the older restrictions that were at one year and at one point “any time since 1977”. Think about that. If a man had had sex with another man in 1986, but consistently tested negative for HIV/AIDS for the following 30+ years, they could not give blood under previous rules. By the way, I will make a note here that these rules are NOT set by the American Red Cross, but rather by the FDA. So don’t get too angry at the Red Cross for this.

The argument for a 3 month window apparently was based on the fact that HIV tests now are good enough that they can pick up viral particles after that window (i.e. at say 2 months, you may be infected, but the tests may not detect it.)

Based on the CDC information I found today, in 2018, male-to-male sexual contact resulted in 24,933 new infections. The 2nd highest category was heterosexual contact (note the CDC page doesn’t seem to specify the word sexual there.) So yes, statistically it appears male-male sexual contact is a high-risk category.

But…

I know a number of gay and bisexual men. I don’t inquire about their sexual habits. However, a number are either married or appear to be in monogamous relationships. This means if they want to give blood and not lie on the forms, they have to be celibate for at least 3 months at a time!  But hey if you’re a straight guy and had sex with 4 different women in the last week, no problem, as long as you didn’t pay any of them for sex! I’ll add that more than one gay man I know wants to give blood and based on their actual behavior are in a low risk category, but can’t because of the above question.

Why do I bring all this up at the end of Pride Month and what, if anything does it have to do with database design (something I do try to actually write about from time to time)?

As a cis-het male (assigned at birth and still fits me) it’s easy to be oblivious to the problematic nature of the questions on such an innocuous and arguably well-intended  form. The FDA has certain mandates that the Red Cross (and other blood donation agencies) must follow. And I think the mandates are often well-intended. But, there are probably better ways of approaching the goals, in the examples given above, of helping to rule out higher-risk donations. I’ll be honest, I’m not always sure the best way.  To some extent, it might be as simple as rewording the question. In others, it might be necessary to redesign the database to better reflect the realities of gender and sex, after all bits are cheap.

But I want to tie this into something I’ve said before: diversity in hiring is critical and I think we in the data world need to be aware of this. There are several reasons, but I want to focus on one for now.

Our Databases Model the World as We Know It.

The way we build databases is an attempt to model the world. If we are only aware of two genders, we will build our databases to reflect this. But sometimes we have to stop and ask, “do we even need to ask that question?” For one thing, we potentially add the issue of having to deal with Personally Identifiable Information that we don’t really need.  For another, we can make assumptions: “Oh they’re male, they can’t get pregnant so this drug won’t be an issue.”

Now, I’m fortunate enough to have a number of friends who fall into various places on the LGBTQIA+ (and constantly growing collection of letters) panoply and the more I listen, the more complexity I see in the world and how we record it.

This is not to say that you must go out instantly and hire 20 different DBAs, each representing a different identity. That’s obviously not practical. But, I suspect if your staff is made up of cis-het men, your data models may be suffering and you may not even be aware of it!

So, listen to others when they talk about their experiences, do research, get to know more people with experiences and genders and sexualities different from yours. You’ll learn something and you also might build databases. But more importantly, you’ll get to know some great people and become a better person yourself. Trust me on that.

 

 

 

Yesterday was NOT a “Monday”

Last week I wrote about how the previous day was A Monday and by that I didn’t just mean by its position on the calendar.

Well I’m here to say that while yesterday as Monday, it was distinctly NOT A Monday. Part of my morning habit when I get up is to check the emails from my largest client. There are of course a bunch of processes that run overnight that I need to check on and fix and rerun if there’s a problem.

I wrote last week about how one ETL job had failed and I had to fix it. I fixed it, but Thursday, Friday and Saturday it failed again due to a related but different issue. Saturday I implemented a change that I was confident would solve the problem. Here I am 3 days later and I can say that it’s been running well since.

Another process that had broken last Monday ran fine. It was off to a good start.

A bit later in the day I received a copy of the weekly audit report. This shows security items that need to be reviewed and mitigated. There was a huge drop in issues from the previous week and one whole column had disappeared. This was a great day.

Later in the day I was able to complete some VB.NET code for an internal web app I maintain for the customer and add new functionality, and too boot, I found a far simpler way to implement what I want than I had started with.

About the only real issue yesterday that was work related was that the promised production data for a new project didn’t show up until dinner time. But, hey that’s on teh customer.

So, not every Monday is necessarily A Monday. Some are distinctly better.

Yesterday was “A Monday”

Yesterday was a Monday. I don’t just mean it was Monday, but it was in the Garfield comic sense of things A Monday.

As a consultant, I’ve come to expect certain patterns in my work load. For one client, I know approximately every 2 months, over 2 weekends I’m going to have to patch their SQL Servers. I know certain passwords will need to be updated quarterly or annually. And I know sometimes I’ll have A Monday.

Yesterday was one of those. I woke up, checked my email and noticed two jobs had not run. So I logged in and it appeared that the PowerShell script on each server had hung. I killed it and tried to rerun it, but got an error. This wasn’t entirely surprising. This script, in its first part downloads a file from a 3rd party vendor and last week for example, their SFTP server had been down. At first I expected this to be the problem again. But further testing showed I was getting inconsistent errors. Finally the script ran. But, what normally took about 20 minutes to download, took about 2 hours. We learned later the vendor had done an upgrade to their product over the weekend. This shouldn’t have impacted their SFTP server performance, but here we were. Today (Tuesday) the process took 20 minutes again and is back to normal. Chalk yesterday’s issue up to being A Monday.

Then I took a look at another job that had failed. This one is purely internal. Basically SFTP a file from a Linux server to a NAS for a backup. A quick check showed that the NAS share was inaccessible. Reporting this triggered an avalanche of emails back and forth. The most interesting line basically came down to “Yes, the internal IT team did a migration of the NAS, but the migration was supposed to be completely transparent to the users.” Famous last words in my book. Actually, honestly, what I decided was more disturbing was that the failure was on the new NAS device apparently due to a typo. To me, this means, most likely, all the old shares were recreated on the new device by hand, rather than using a script that read out the old shares and recreated them. In any event, the problem was solved, the job was rerun and the backup created on the now new NAS. Chalk that one up to being A Monday.

Then one of the developers for one of the platforms at this client emailed me and said, “Hey database FOO is in recovery mode, what happened?” This one, fortunately I knew exactly what the problem was. Unfortunately I knew it was my fault. We had decided to reconfigure that database to be a log-shipped copy of the main database and I had set it up over the weekend. I had simply forgotten to set it up to place itself in Stand-by/Read-only mode after it had applied the most recent logs. I’ll chalk that one up to it being A Monday.

All of the above was taken care of before 10:00 AM. The rest of the day was filled with a variety of other issues and items, including looking at a Hyper-V host machine with 16 physical CPUs with hyperthreading turned on hosting 4 VMs, 1 with 4 vCPUs allocated, and the other 3 with 8 each. They’re having performance issues. I’m still tackling that one. Looking at that happened on Monday, but it’s not A Monday issue, it’s been an ongoing issue for months.

So what was it about this particular Monday, or Mondays in general?

Well in this case, all 3 of my early AM issues had one thing in common: upgrades or changes made over the weekend. I’m not going to debate the value or wisdom of the timing here, but just note, that on the particular Monday, it wasn’t just one issue, but three. It was definitely A Monday. But I survived as did my customer.

Now back to my regularly scheduled workload.

 

“Houston, We Have an Opportunity”

This is not quite the famous quote from the movie Apollo 13, one of my favorite movies. And, well we have a problem. But also an opportunity.  I’ll get to the opportunity in a bit, but first, the problem.

My Event and Disappointment

The problem of course is COVID-19. As the end of the week in which I’m writing this, I was scheduled to help host the National Weeklong Cave Rescue training seminar for the NCRC. For years, I and others had been working on the planning of this event. It’s our seminal event and can attract 100 or more people from across the country and occasionally from other countries. Every year we have it in a different state in order to allow our students to train in different cave environments across the country (New York caves are different from Georgia caves which are different from Oregon caves and more) as well as to make it easier for attendees to attend a more local event.

Traditionally, the events held in NY (This would have been the fourth National Seminar) have attracted fewer people than our events in Alabama which is famous for its caves. But this year was different, with lots of marketing and finding a great base camp, we were on track for not only for the largest seminar yet in New York, but for one on par with our largest seminars anywhere.

Then, in February I started to get nervous. I had been following the news and seeing that unlike previous outbreaks of various flus and other diseases, COVID-19 looked like it would be something different. This wasn’t going to pass quite as quickly. This was going to have an impact. As a result, I started to make contingency plans with the rest of my planning staff. I consulted with our Medical Coordinator. I talked to the camp. By March I started to regain some optimism, but I still wasn’t 100% confident we could pull this off. And then, the questions from attendees started to come in. “Are we still having it?” “What are the plans?” etc.  Another week or two later, “I need to cancel. My agency/school/etc won’t cover the cost this year.”  Finally by the start of April, after talking to several of my planning staff and my fellow regional coordinators it became obvious, we could not, in good conscience host a seminar in June. Yes, here in upstate New York the incidence of COVID-19 is dropping quickly. We’re starting the re-opening process. Honestly, if folks came here, I would NOT be worried about them getting infected from a local source.

However, we would have nearly 100 people coming from across the country, including states where the infection rates are climbing. Many would be crammed into planes for hours, or making transfers at airports with other travelers. So, while locally we might be safe; if we held this event, where folks are in classrooms for 3-4 hours a day, then in cars to/from cave and cliff sites and then often in caves for hours, it’s likely we would have become ground zero for a spike in infections. That would not have been a wise nor ethical thing to do. So, we postponed until next year.

I mention all that because of what happened to me about three weeks ago and then the news from last week: 2020 PASS Summit is going Virtual. About three weeks ago I was asked to participate in a meeting with some of the folks who help to run PASS as well as other User Group leaders. The goal was to discuss how to make a Virtual Summit a great experience if it went virtual. This was one of several such meetings and I know a lot of ideas we brought up and discussed. NONE of us were looking forward to PASS 2020 being virtual, but we all agreed that it was better than nothing. And of course, as you’re well aware, last week PASS made the decision, I suspect due in large part for the very same reasons we postponed our cave rescue training event.

Sadness and Disappointment

Mostly I’ve heard, if not happy feedback, at least resigned feedback. People have accepted the reality that PASS will be virtual. I wrote above about my experience with having to postpone our cave rescue training (because it’s so hand’s on, it’s impossible to host it virtually). It was not an easy decision. I’ll admit I was frustrated, hurt, disappointed and more. I and others had put in a LOT of hard work only to have it all delayed. I know the organizers of Summit must be feeling the same way. And I know many of us attendees must feel the same. Sure, Houston is not Seattle and I’ve come to have a particular fondness for Seattle, in part because of an opportunity to see friends there, but I was looking forward to going to Houston this year (as was my wife) and checking out a new city.

One thing that has helped buoy my emotions in regards to our weeklong cave rescue class is that over 1/2 the attendees said, “roll my registration over to next year. I’m still planning on coming!” That was refreshing and unexpected. Honestly, I was hoping for maybe 1/4 of them at best to say so. This gave me hope and the warm fuzzies.

Opportunity

Let me start with stating the obvious: a virtual event will NOT be the same as in-person event. There will most definitely be things missing. Even with attempts that PASS will be making to try to recreate the so-called “Hallway track” of impromptu discussions and hosting other virtual events to mimic the real thing, it won’t be the same. You won’t get to check out the Redgate Booth in person, hang out on the sofa at Minionware, or get your free massage courtesy of VMWare.

20191106_123733

After a great massage courtesy VMWare.

And we’ll miss out on:

20191105_143514

Achievement unlocked: PASS Summit 2019 Selfie with Angela Tidwell!

But, we’ll still have a LOT of great training and vendors will have virtual rooms and more.

So what’s the opportunity? Accessibility!

Here’s the thing, I LOVE PASS Summit. I think it’s a great training and learning opportunity. But let’s face it. It can get expensive, especially when you figure in travel costs, hotel costs and food costs. This year though, most of those costs disappear. This means that when you go to your boss, they have even less of an excuse to say, “sorry it’s not in the budget”.  And honestly, if they DO say that, I would seriously suggest that you consider paying for it out of your own budget. Yes, I realize money might be tight, but after all the wonderful training you can then update your resume and start submitting it to companies that actually invest in training their employees.

I would also add, from my understanding, while convention centers by law ADA accessible, this doesn’t necessarily mean everyone with a disability can attend. There can be non-physical barriers that interfere. Hosting virtually gives more people the ability to “attend” in a way that works for them. It might be in a quiet, darkened room if they’re sensitive to noise and lights. It might be replaying sessions over and over again if they need to hear things in that fashion. It very well could be taking advantage of recorded sessions and the like in ways that I, an able-bodied person isn’t even aware of. So that’s a second way in which it’s accessible.

Now, I know folks are questioning “well if its virtual, why should we pay anything, especially if vendors are still paying a sponsorship fee?” There’s several answers to that and none of them by themselves are complete, but I’ll list some. For one, I haven’t confirmed, but I’m fairly confident that vendors are paying a lot less for sponsorship, because they won’t get the same face to face contact. For another, PASS takes money to run. While we often think of it as a single big weeklong event, there’s planning and effort that goes on throughout the year. This is done by an outside organization that specializes in running organizations like PASS. (Note the PASS Board is still responsible for the decision making that goes on and the direction of PASS as a whole, but day to day operations are generally outsourced. This is far from uncommon. Those costs don’t disappear. There’s other costs that don’t automatically disappear because the event is no longer physical. And of course there are costs that a virtual event has that the physical event doesn’t. Now EVERY single session will be available as a live-stream (as well as recorded for later download) and this requires enough bandwidth and tools to manage them. And it requires people to help coordinate.  Making an event virtual doesn’t automatically make it free to run.

The Future

Now, I know right now I’m on track for hosting the NCRC Weeklong Cave Rescue training event next year at the location we planned on for this year. Our hope of course is that by then COVID-19 will be a manageable problem. But in the meantime, I’ll keep practicing my skills and sharing my knowledge and when and where I can, caving safely. And as always willing to take new folks caving. If you’re interested, just ask!

I don’t know what PASS 2021 Summit will bring or even where it will be. But I know this year we can make the most of the current situation and turn this into an opportunity to turn PASS into something new and more affordable. Yes, it will be different. But we can deal with that. So, register today and let’s have a great PASS 2020 Summit in the meantime!  I look forward to seeing you there. Virtually of course!

“It’s a Jump to the left…

… and then a double-hop to the right.” Or something like that.

I’ve commented before on the fact that I’m a consultant. I enjoy it. People will ask me what I do, and it varies. At one client they refer to their VB app as “the database” and because they found an ad of mine on Google where I talked about database administration, they hired me. About 80% of the work I do for them is actually on the VB app or related, very little is actually what I’d consider traditional database work. But that’s ok, they’re a pleasure to work with and I enjoy the work. Another client I recently worked with, asked me to help them conduct an audit of their web based product and help them with some steps to make it more secure. I was more than happy to help.

And then there’s my largest, by far, client. I actually get to do a fair amount of work that most of my #sqlfamily would recognize as “database work”. But there, perhaps more than any other, I describe my duties as “DBA and other duties as assigned.”  So between the work at this client and all my other clients, I’m often jumping or stepping around stuff.

The Double-Hop

This time though I was asked to double-hop. What is that exactly? It’s an issue that has to do with how Windows can pass security credentials from one server to another. This article, while old, describes it well. This was essentially the situation my client was trying to solve: Users needed to use their Active Directory (AD) Credentials to log into the Reporting Server (RS_Server) and run a report that in turn accessed data on a separate database server (DB_Server), and thus, the double-hop. Now, from my point of view, this isn’t really database work, but since the reporting server talks to the database server it was dropped in my lap under “other duties as assigned.”

Now, honestly, this SHOULD be simple to solve. It wasn’t. One reason was in part because, like many companies, this client has a separate team that handles much of their infrastructure needs, such as AD requests. And they have to go through tickets. To be clear, I support this concept, in theory. In practice, it can often take 2-3 weeks for even simple requests to go through. This meant that my first attempt at solving the double-hop failed. Their IT department did exactly what I requested. Unfortunately there was a typo in my ticket. So it failed. So round two. And round two didn’t work. Nor did round three. At this point though it wasn’t due to typos or mistakes on my end.

I started reading every article I could. My great editor at Red-Gate, Kathi Kellenberger has one, and trust me I wasn’t too shy to ask at that point! But nothing was working. I even asked another DBA at the client (they actually head up a different group and is their lead DBA). She pointed me to one of her people saying, “talk to him, he solved it.” I did, and he hadn’t. His solution was the one we were trying to avoid (basically using a fixed user in the datasource).

Frustration Sets In

I was getting frustrated. Fortunately at this point I started to exploit a loophole in the ticketing process. Since the problem wasn’t being fixed, I was able to keep it open and ended up getting assigned someone from their IT group who was as interested in fixing this as I was. This meant rather than “open a ticket, wait 1-2 weeks, have ticket be closed as complete, test, find out it failed, rinse” we could now actually schedule Zoom sessions and make changes in real-time. AND…. nothing we tried worked.

At this point you’re probably saying, “Yeah, yeah, get to the point. Did you solve it?”  The answer is yes, but I wanted you to feel a bit of my pain first, and I needed to make this post long enough to make it worth posting.

A Solution!

Now, let me say, I wish I could write out an exact recipe card solution for you. For various reasons, I can’t. But bear with me.

Finally, we found an additional resource in the IT group who had solved this before. His first recommendation was yet again, the solutions I mentioned above. He saw they didn’t work, agreed we were trying the right thing. So he said, “well let’s try a solution known as “Resource-based Kerberos Constrained Delegation“. This didn’t work at first either.

But then he suggested that we turn on 128 and 256 bit encryption on the DB_Server SQL account. Bingo that worked. Mostly. More on that in a second.

So here’s the setup we ended up with.

  1. RS_Server – running reporting services under an account domain\RS_Server_Service
  2. DB_Server – running SQL Server under an account domain\DB_Server_Service
  3. Setup some SPNs
    1. MSSQLsvc/DB_Server domain\DB_Server_Service
    2. MSSQLsvc/DB_Server.domain.com domain\DB_Server_Service
    3. Note in this case you do not appear to need one on the RS_Server side.
  4. Run a Powershell script
    1. $FEIdentity1 = Get-ADUser -Identity domain\RS_Server_Service
    2. $BEIdentity = Get-ADUser -Identity domain\DB_Server_Service
    3. Set-ADuser $BEIdentity -PrincipalsAllowedToDelegateToAccount $FEIdentitity1
  5. At this point things should have worked, but they didn’t until we then enabled the encryption options:

    encryption options for Kerberos

    Kerberos Encryption

  6. Then on our test box, things magically worked! Ok, not quite magically, but things worked. We had a solution.

And I was even more ecstatic when later that day, I tested this on a second report server box we had and it too suddenly was working without any changes. And this was a box where we had NOT even setup an SPN for the original double-hop solution, so I was pretty confident that the Resource-based Kerberos Constrained Delegation was working. In addition, in the rsserver.config file, the only authentication enabled was NTLM.

The next step was to try this on a production server. In that case, I did have to reconfigure the service it was running under to use the domain account domain\RS_Server_Service.

And… my test failed.

I was at wit’s end. I couldn’t quite figure out what was different. I checked my service names, my SPNs, the rsserver.config file, and more. Nothing was working. I took a break and came back and had an idea. In the datasource I changed it from:

Data source=DB_Server;Initial Catalog=TestDB

to

Data source=DB_Server.domain.com;Initial Catalog=TestDB

Bingo, it worked! A little digging confirmed my suspicion. This client actually has multiple DNS domains and the ordering and like under the TCP/IP settings was different on this box from the other two boxes. And that made the difference.

Sure enough when I tried deploying to a fourth box, I had the same issue, but changing it to the Fully Qualified Domain Name (FQDN) solved my issue.

So, my take-aways for this week:

  •  Resource-based Kerberos Constrained Delegation may be a better solution at solving the Double-Hop solution than the solution generally proposed.
  • Once you’ve setup the “target” SQL Server service account and source Reporting Server Service accounts, additional reporting servers can be added to the mix without needing assistance from a domain admin.
  • It appears you still need an SPN (well multiple) for the SQL Server itself.
  • You need to run a PowerShell Script to setup the accounts. Note that if you run it again, it overwrites the old settings, so you need to add ALL of the source accounts in a single step.
  • Depending on your domains security setup, you may need to enable 128/256 bit Kerberos authentication.
  • DNS resolution may determine if you can use just the NetBIOS name or the FQDN in your data sources.
  • This solution will NOT work if you need to cross domains or have more complex setups, but in general, it can be simpler to setup and to maintain, especially if you have limited access to making changes to AD in the first place.
  • My reading indicates this only works on Windows 2012 and beyond. But you shouldn’t be running older versions of Windows in any case!

If I get the time and energy, I may setup a test environment in my home lab to further experiment with this and write up better demos, but for now, use this as you can. Hopefully it’ll save you some of the stress I experienced.

And that’s it from here, back to other duties as assigned.