Inbound and Outbound to NYC

I still recall the first computer program I wrote. Or rather co-wrote. It was a rather simple program, in Fortran I believe, though that’s really an educated guess. I don’t have a copy of it. It was either in 7th or 8th grade when several of us were given an opportunity to go down to the local high-school and learn a bit about the computer that they had there. I honestly have NO idea what kind of computer it was, perhaps a PDP-9 or PDP-11. We were asked for ideas on what to program and the instructor quickly ruled out our suggestion of printing all numbers from 1 to 1 Million. He made us estimate how much paper that would take.

So instead we wrote a program to convert temperature Fahrenheit to Celsius. The program was as I recall a few feet long. “A few feet long? What are you talking about Greg?” No, this was not the printout. This wasn’t how much it scrolled on the screen. Instead it was the length of the yellow (as I recall) paper tape that contained it. The paper tape had holes punched into it that could be read by a reader. You’d write your program on one machine, and then take it over to the computer and feed it into the reader and it would run it. I honestly don’t recall how we entered the values to be converted, if it was already on the tape or through some other interface. In any case, I loved it and fell in love with computers then. Unfortunately, somewhere over the years, that paper tape has since disappeared. That saddens me. It’s a memento I wish I still had it.

In four or five short years, the world was changing and quickly. The IBM PC had been released while I was in high school and I went from playing a text adventure game called CIA on a TRS-80 Model II to programming in UCSD Pascal on an original IBM PC. (I should note that this was my first encounter with the concept of a virtual machine and p-code machine.) This was great, but I still wanted more. Somewhere along the line I encountered a copy of Microsoft’s Flight Simulator. I loved it. In January of 1985 my dad took me on a vacation to St. Croix USVI. Our first step on that trip was a night in NYC before we caught our flight the next morning. To kill some time I stepped into 47th Street Photo and bought myself a copy of Flight Simulator. It was the first software I ever bought with my own money. (My best friend Peter Goodrich and I had previously acquired a legal copy of DOS 2.0, but “shared” it. Ok, not entirely legal, but hey, we were young.)

I still have the receipt.

For a High School Student in the 80s, this wasn’t cheap. But it was worth it!

I was reminded of this the other day when talking with some old buddies that I had met when the Usenet sci.space.policy was still the place to go for the latest and greatest discussions on space programs. We were discussing our early intro to computers and the like.

I haven’t played this version in years, and honestly, am not entirely sure I have the hardware any more that could. For one thing, this version as I recall was designed around the 4.77Mhz speed of the original IBM PC. This is one reason that some of my readers may recall when the PC AT clones came out running the 80286 chip running at up to 8Mhz (and faster for some clones) there was often a switch to run the CPU at a slower speed because many games otherwise simply ran twice as fast and as a result the users couldn’t react fast enough. So even if I could find a 5 1/4″ floppy and get my current machine to read the drives in a VM, I’m not sure I could clock down a VM slow enough to play this. But, I may have to do this one of these days. Just for the fun of it.

I still have the original disks and documentation that came with it.

Flying outbound from NYC

A part of me does wonder if this is worth anything more than the memories. But for now, it remains in my collection; along with an original copy of MapInfo that was gifted to me by one of the founders. But that’s a stroll down memory lane for another day.

And then I encountered SQL Server only a short 6 or so years later. And that ultimately has been a big part of where I am today.

SQL Server Scheduled Email Queries

Every once in a while I come across a problem that both surprises me that it exists and that the solution is often trivial. This is one of those. Basically developer at a client wanted to setup a scheduled task that would execute a query and email the results. Everything appeared to work, but the emails never went out. There were no errors that were obvious other than the emails never arrived. Some digging via profiler showed that the SQL Agent user was having permission issues. But basically giving it every permission possible didn’t solve it.

So let me walk you through it. First, let’s create a real simple stored procedure. This assumes you have AdventureWorks2014 installed (yeah, it’s old, but it’s what I had handy).

Use adventureworks2014
GO
Create or Alter Procedure Send_Test_Email
as
exec msdb.dbo.sp_send_dbmail @recipients=’test@example.com’, @body=’This is the body of the email’, @subject=’Test Email w query embedded’,
@query=’select * from adventureworks2014.sales.SalesOrderHeader where subtotal > 150000′;

Now, here’s an important detail I did discover during testing. If my query didn’t actually query a table, but instead was say @query=’select getdate()’ things worked fine.

That said, if you simply execute the query above in SSMS, it should work just fine. (I’d recommend you put your own email in it for testing. This way you’ll know if the email is actually being sent.)

Before you do that, also create the following stored procedure:

Use adventureworks2014
GO
Create or Alter Procedure Send_Test_Email_Query_attached
as
exec msdb.dbo.sp_send_dbmail @recipients=’test@example.com’, @body=’This is the body of the email’, @subject=’Test Email w query embedded’,
@query=’select * from adventureworks2014.sales.SalesOrderHeader where subtotal > 150000′,
@attach_query_result_as_file = 1,
@query_result_no_padding = 1,
@query_attachment_filename = ‘Sales Report.csv’,
@query_result_separator = ‘;’,
@exclude_query_output = 1,
@append_query_error = 0,
@query_result_header = 0

This will execute the same query as the original stored procedure, but place the contents in a CSV file as an attachment.

Again, if you execute the above directly from SSMS, you should receive the email without an issue. This is basically what the client was attempting to do.

Now create the following job:

USE [msdb]
GO

/ Object: Job [Send AdventureWorks Email] Script Date: 7/27/2021 9:12:49 AM / BEGIN TRANSACTION DECLARE @ReturnCode INT SELECT @ReturnCode = 0 / Object: JobCategory [[Uncategorized (Local)]] Script Date: 7/27/2021 9:12:49 AM /
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]’ AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N’JOB’, @type=N’LOCAL’, @name=N'[Uncategorized (Local)]’
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N’Send AdventureWorks Email’,
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N’No description available.’,
@category_name=N'[Uncategorized (Local)]’,
@owner_login_name=N’sa’, @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/ Object: Step [Send email] Script Date: 7/27/2021 9:12:50 AM /
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’Send email’,
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N’TSQL’,
@command=N’exec Send_test_email_Query_attached’,
@database_name=N’adventureworks2014′,
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)’
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

If you execute this job as is, you should see the following:

Success! Or is it?

But you will never get the email!

Modify the above job and replace the line:

@command=N’exec Send_test_email_Query_attached’,

with

@command=N’exec Send_test_email’,

Now if you run the job, it will fail!

This time it clearly failed!

But as is usual with sp_send_dbmail, the error message isn’t overly helpful:

Executed as user: NT AUTHORITY\SYSTEM. Failed to initialize sqlcmd library with error number -2147467259. [SQLSTATE 42000] (Error 22050). The step failed.

But it does help give a clue. The problem seems to be some sort of permissions issue. So I’ll admit, at this point I tried all sorts of solutions, including setting up a proxy user, giving unfettered rights to my SQL Agent user and other things (thinking once I got it working I could then lock things back down). Instead, I found a much easier solution buried in a thread on the Microsoft site.

You’ll note when I wrote the original stored procedure I fully qualified the table name. This is often generally useful, but here I did it because I was cheating and knew I’d need it for this demo.

The solution is actually VERY simple and I’ll show it both graphically and via a script.

First: change the database to MSDB and then fully qualify your call to the stored procedure as below:

Execute it from the MSDB database

However, there’s one more critical step: under the ADVANCED tab in the step change the Run As User to dbo:

Seems simple, but it’s critical

Now if you script out the scheduled task it should look like:

USE [msdb]
GO

/ Object: Job [Send AdventureWorks Email] Script Date: 7/27/2021 9:26:37 AM /
EXEC msdb.dbo.sp_delete_job @job_id=N’08a55e18-eecf-4d4c-8197-8135a0d7520b’, @delete_unused_schedule=1
GO

/ Object: Job [Send AdventureWorks Email] Script Date: 7/27/2021 9:26:37 AM / BEGIN TRANSACTION DECLARE @ReturnCode INT SELECT @ReturnCode = 0 / Object: JobCategory [[Uncategorized (Local)]] Script Date: 7/27/2021 9:26:37 AM /
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]’ AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N’JOB’, @type=N’LOCAL’, @name=N'[Uncategorized (Local)]’
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N’Send AdventureWorks Email’,
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N’No description available.’,
@category_name=N'[Uncategorized (Local)]’,
@owner_login_name=N’sa’, @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/ Object: Step [Send email] Script Date: 7/27/2021 9:26:37 AM /
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’Send email’,
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N’TSQL’,
@command=N’exec adventureworks2014.dbo.Send_Test_Email_Query_attached’,
@database_name=N’msdb’,
@database_user_name=N’dbo’,
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)’
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

If you then execute THIS scheduled task you’ll get both a success message:

And you should actually receive the email with the query results attached.

So, basically an issue that surprised me that it even existed, actually ended up with a fairly simple solution.

… Other Duties as Assigned

I’ve mentioned once before that at one of my clients I describe my job as “DBA and other duties as assigned.”

This phrase has really been on my mind this week, especially during a phone call with another client yesterday. This second client is a local consulting company that has hired me a few times to back them with my skills in SQL Server and MS Access. This time around the work they’re looking for is definitely SQL Server related. It was refreshing.

But it reminded me of my last two weeks with two of my other clients. One is having an issue with their app (that they always call “the database”) that is most likely a design issue that I need to dig into. This is a perfect example of what I call “software archeology” where I at times have to shift through “pot shards” to determine what the original developer was thinking. At times it can be fun and interesting, at other times, frustrating. I’ll be shifting through more pot shards in the near future to get to the bottom of this problem.

For my largest client, I spent most of my hours with them last week trying to true up a file with some financial data in it. In this case it’s part of an ETL process where I receive data, compile it and send it to a vendor. The process uses a combination of PowerShell and Pentaho. So while they interact with the database, the work I was doing wasn’t in T-SQL or directly on the database server.

The numbers weren’t adding up. There was an undercurrent of “Greg, your numbers are wrong” or “You’re filtering on the wrong criteria.” I kept pointing out that “I simply add up the numbers you give me.” Eventually the problem was narrowed down to the fact that in the source system, which is the system of record, they had deleted rows. Arguably, one should never be deleting rows in such a system, but rather issuing a 2nd row (a credit if you want to reverse a debit, or a debit to reverse a credit) and this was typically what was done. But in this case the maintainers of the source of record decided to wholesale delete these rows. I explained that from day one, since deletions are never supposed to happen (and given the way the system works, extremely hard to detect) all I do is either insert new rows, or update existing rows. In any event, with one minor schema change, some updates to the rows in question and an updated PowerShell script, I was able to make the numbers come out to match with theirs. So, is that really DBA work? Not in the traditional sense. But it’s definitely other duties as assigned.

Now that’s not to say I didn’t do what some might consider actual DBA work. On Saturday morning I patched one of their servers. And at one point during the week, I deployed a script to production. So, out of 18 hours of work for the customer last week, I think I can say maybe 1-2 total was “dba work” or about 5%.

Now, I want to be clear. This is not a rant or a complaint. I’ll admit I tend to prefer to work directly with SQL Server, but I was reminded of a quick discussion I had with a fellow DBA over the weekend about how they probably needed to start to learn PowerShell for their job.

I’ve been arguing for years that the role of a DBA has changed, and will continue to change dramatically over the next few years. Once where we might spend days head down slinging T-SQL code, setting up backups and restores, tuning indices, etc. now much of that is automated or at least far easier to do. Which is a good thing. In years past, a DBA might be responsible for a dozen machines or so at the most. If it was more than that, we’d feel sorry for them. That’s no longer uniformly true. I know a DBA who is responsible for over 100 machines. They’re the soul DBA. But, through PowerShell and other modern tools, it’s generally not an overwhelming job.

However, like the online presentation from the Atlanta Azure Data User Group I attended last night on SQL Database Edge, there is a growing list of things DBAs need to learn. Steve Jones recently posted about whether DBAs need to learn Linux? The short take away is not necessarily, but it’s probably a good idea, but we definitely need to learn about containers.

I have heard for years, “Microsoft will automate everything and the DBA’s job will go away.” Not only is that not true in my experience, the exact opposite is. I think being a successful DBA is in some ways harder than it was a decade ago. There’s so much more to be aware of and to learn.

Off the top of my head, without any real priority I came up with the list below of technologies that a modern DBA might find useful to know. This is not to say I know them all, or that one has to be an expert in all of them. And I will note, this is far from an inclusive list. I also left out third-party tools which are so common place. But I think it illustrates just how broad the required skillset of a good DBA is these days.

  • T-SQL
  • PowerShell
  • Query Store
  • Linux – at least at the most basic level
  • Containers
  • SSIS
  • SSAS
  • SSRS
  • Storage – (at least how different types can impact performance and the advantages and disadvantages of each)
  • Azure
  • SQL Database Edge
  • git or some form of version control

In conclusion, I’ll say, I’m not going to make any predictions about where the Microsoft data platform will be a decade from now, but I can tell you that DBAs will still be needed but their skillset will be as different from today as today is from a decade ago.

And post conclusion, I’ll add I’ll continue to rely on #sqlfamily and all my fellow DBAs to help me out. And continue to help them.

Your Boss Doesn’t Care About Backups!

It’s true. Even if they don’t realize it. Or even if they claim they do. They really don’t.

I’ve made this point before. Of course this is hyperbole. But a recent post by Taryn Pratt reminded me of this. I would highly recommend you go read Taryn’s post. Seriously. Do it. It’s great. It’s better than my post. It actually has code and examples and the like. That makes it good.

That said, why the title here? Because again, I want to emphasize what your boss really cares about is business continuity. At the end of the day they want to know, “if our server crashes, can we recover?” And the answer had better be “Yes.” This means that you need to be able to restore those backups, Or have another form of recovery.

Log-Shipping

It seems to me that over the years log-shipping has sort of fallen out of favor. “Oh we have SAN snapshots.” “We have Availability Groups!” “We have X.” “No one uses log-shipping any more, it’s old school.”

In fact this recently came up in a DR discussion I had with a client and their IT group. They use a SAN replication software to replicate data from one data center to another. “Oh you don’t need to worry about shipping logs or anything, this is better.”

So I asked questions like was it block-level, file-level, byte-level or what? I asked how much latency there was? I asked how we could be sure that data was hardened on the receiving side. I actually never got really clear answers to any of that other than, “It’s never failed in testing.”

So I asked the follow up question, “How was it tested.” I’m sure their answer was supposed to reassure me. “Well during a test, we’d stop writing to the primary, shut it down and the redirect the clients to the secondary.” And yes, that’s a good test, but it’s far from a complete test. Here’s the thing, many disasters don’t allow the luxury of cleaning stopping writes to the primary. They can occur for many reasons, but in many cases the failure is basically instantaneous. This means that data was inflight. Where in flight? Was it hardened to the log? Was that data in flight to the secondary? Inquiring minds want to know.

Now this is not to say these many methods of disk based replication (as opposed to SQL based which is a different beast) aren’t effective or don’t have their place. It’s simply to say, they’re not perfect and one has to understand their limitations.

So back to log-shipping. I LOVE log-shipping. Let me start with a huge caveat. In an unplanned outage, your secondary will only be up to date as the most recent log backup. This could be an issue. But, the upside is, you should have a very good idea of what’s in the database and your chances of a corrupted block of data, or the like is very low.

But there’s two facts I love about it.

  1. Every time I restore a log file, I’ve tested the backup of that log file. This may seem obvious, but, it does give me a constant check on my backups. If my backups fail for any reason, lack of space, a bad block gets written and not noticed, etc. I’ll know as soon as my next restore fails. Granted, my FULL Backups aren’t being restored all the time, but I’ve got at least some more evidence that my backup scheme in general is working. (and honestly, if I really needed to, I could backup my copy and use that in a DR situation.)
  2. It can make me look like a miracle worker. I have, in the past, in a shop where developers had direct access to prod and had been known to mess up data, used log-shipping to save the day. Either on my DR box, or a separate box I’d keep around that was too slow CPU wise for DR, but had plenty of diskspace, I’d set it to delay applying logs for 3-4 hours. In the event of most DR events, it was fairly simple to catch-up on log-shipping and bring the DR box online. But more often than not, I used it (or my CPU weak but disk heavy box) in a different way. I’d get a report from a developer, “Greg, umm, I well, not sure how to say this, but just updated the automobile table so that everyone has a White Ford Taurus.” I’d simply reply, “give me an hour or so, I’ll see what I can do.” Now the reality is, it never took me an hour. I’d simply look at the log-shipped copy I had, apply any logs I needed to catch up to just before their error, then script out the data and fix the data in production. They were always assuming I was restoring the entire backup or something like that. This wasn’t the case, in part because doing so would have taken far more than an hour, and would have caused a complete production outage.

There was another advantage to my 2nd use of log-backups. I got practice at manually applying logs, WITH NOROLLBACK and the like. I’m a firm believer in Train as you Fight.

Yes, in an ideal world, a developer will never have such unrestricted access to Production ( and honestly it’s gotten better, I rarely see that these days) and you should never need to deal with an actual DR, but we don’t live in an ideal world.

So, at the end of the day, I don’t care if you do log-shipping, Taryn Pratt’s automated restores or what, but do restores; both automated and manually. Automated because it’ll test your backups. Manually because it’ll hone your skills for when your primary is down and your CEO is breathing down your neck as you huddle over the keyboard trying to bring things back.

Reminder

As a consultant, I’m always looking for new clients. My primary focus is helping to outsource your on-prem DBA needs. If need help, let me know!

“Houston, we’re venting something into Space…”

This post is the result of several different thoughts running through my head combined with a couple of items I’ve seen on social media in the past few days. The first was a question posted to #SQLHelp on Twitter in regards to if a DBA came into a situation with a SQL Server in an unknown configuration what one would do. The second was a comment a friend made about how “it can’t get any worse” and several of us cheekily corrected him saying it can always get worse. And of course I’m still dealing with my server that died last week.

To the question of what to do with an unknown SQL Server, there were some good answers, but I chimed in saying my absolute first thing would be to make backups. Several folks had made good suggestions in regards to looking at system settings and possibly changing them, possibly re-indexing, etc. My point though was, all that could wait. If the server had been running up until now, while fixing those might be very helpful, the lack of fixing things would not make things worse. On the other hand, if there were no up to date backups and the server failed, the owner would be in a world of hurt. Now, for full disclosure, I was “one-upped” when someone pointed out that assuming they did have backups, what one really wanted to do was a restore. I had to agree. The truth is, no one needs backups, what they really need are restores. But the ultimate point is really the same, without a tested backup, your server can only get much worse if something goes wrong.

I’ve had to apply this thinking to my own dead server. Right now it’s running in a Frankenbeast mode on an old desktop with 2GB of RAM. Suffice to say, this is far from ideal. New hardware is on order, but in the meantime, most things work well enough.

I actually have a newer desktop in the house I could in theory move my server to. It would be a vast improvement over the current Frankenbeast; 8GB of RAM and a far faster CPU. But, I can’t. It doesn’t see the hard drive. Or more accurately, it won’t see an OS on it. After researching, I believe the reason comes down to a technical detail about how the hard drive is setup (namely the boot partition is what’s known as a MBR and it needs to be GPT). I’ll come back to this in a minute.

In the meantime, let’s take a little detour to mid April, 1970. NASA has launched two successful Lunar landings and the third, Apollo 13 is on its way to the Moon. They had survived their launch anomaly that came within a hair’s breadth of aborting their mission before they even made orbit. Hopes were high. Granted, Ken Mattingly was back in Houston, a bit disappointed he had been bumped from the flight due to his exposure to rubella. (The vaccine had just been released in 1969 and as such, he had never been vaccinated, and had not had it as a child. Vaccines work folks. Get vaccinated lest you lose your chance to fly to the Moon!)

Stack of Swiss cheese slices showing holes lined up.

A routine mission operation was to stir the oxygen tanks during the flight. Unfortunately, due to a Swiss Cheese effect of issues, this nearly proved disastrous when it caused a spark which caused an “explosion” which blew out the tank and ruptured a panel on the Service Module and did further damage. Very quickly the crew found themselves in a craft quickly losing oxygen but more importantly, losing electrical power. Contrary to what some might think, the loss of oxygen wasn’t an immediate concern in terms of breathing or astronaut health. But, without oxygen to run through the fuel cells, it meant there was no electricity. Without electricity, they would soon lose their radio communication to Earth, the onboard computer used for navigation and control of the spacecraft and their ability to fire the engines. Things were quickly getting worse.

I won’t continue to go into details, but through a lot of quick thinking as well as a lot of prior planning, the astronauts made it home safely. The movie Apollo 13, while a somewhat fictionalized account of the mission (for example James Lovell said the argument among the crew never happened, and Ken Mattingly wasn’t at KSC for the launch), it’s actually fairly accurate.

As you may be aware, part of the solution was to use the engine on the Lunar Module to change the trajectory of the combined spacecraft. This was a huge key in saving the mission.

But this leads to two questions that I’ve seen multiple times. The first is why they didn’t try to use the Service Module (SM) engine, since it was far more powerful and had far more fuel and they in theory could have turned around without having to loop around the Moon. This would have saved some days off the mission and gotten the astronauts home sooner.

NASA quickly rejected this idea for a variety of reasons, one was a fairly direct reason: there didn’t appear to be enough electrical power left in the CSM (Command/Service Module) stack to do so. The other though was somewhat indirect. They had no knowledge of the state of the SM engine. There was a fear that any attempt to use it would result in an explosion, destroying the SM and very likely the CM, or at the very least, damaging the heatshield on the CM and with a bad heatshield that would mean a dead crew. So, NASA decided to loop around the Moon using the LM descent engine, a longer, but far less risky maneuver.

Another question that has come up was why they didn’t eject the now dead and deadweight, SM. This would have meant less mass, and arguably been easier for the LM to handle. Again, the answer is because of the heatshield. NASA had no data on how the heatshield on the CM would hold up after being exposed to the cold of space for days and feared it could develop cracks. It had been designed to be protected by the SM on the flight to and from the Moon. So, it stayed.

The overriding argument here was “don’t risk making things worse.” Personally, my guess is given the way things were, firing the main engine on the SM probably would have worked. And exposing the heatshield to space probably would have been fine (since it was so overspecced to begin with). BUT, why take the risk when they had known safer options? Convenience is generally a poor argument against potentially catastrophic outcomes.

So, in theory, these days it’s trivial to upgrade a MBR disk to a GPT one. But, if something goes wrong, or that’s not really the root cause of my issues, I end up going from a crippled, but working server to a dead server I have to rebuild from scratch. Fortunately, I have options (including now a new disk so I can essentially mirror the one disk, have an exact copy and try the MBR->GPT solution on that one) but they may take another day or two to implement.

And in the same vein, if it’s a known SQL Server, or an unknown one, you’re working on, PLEASE make backups before you make changes, especially anything dramatic that risks data loss. (and I’ll add a side note, if you can, avoid restarting SQL Server when diagnosing issues, you lose a LOT of valuable information in the DMV tables.

So things CAN get worse. But that doesn’t mean there’s any need to take steps that will. Be cautious. Have a backout plan.

A Speaker’s Timeline

This post will be short, for reasons that are hopefully obvious by the end.

Sometime in February

Hmm, I should put together some ideas to submit to present to SQL Summit in Houston (not Dallas as Mistress SQL pointed out to me) this year.

March 16th

An update, the call for speakers has been postponed. Darn.

March 23rd

Call for speakers is finally open!

March 30th

Submit 3 possible topics.

April 1st

Approach a fellow speaker about a possible joint session, but after discussion, decide not to go ahead with the idea.

June 3rd

Get an update, Summit will be virtual this year. Thankfully I didn’t book any tickets or hotel rooms in Dallas.

July 20th 6:49 PM EDT

Woohoo! I got the email! One of my submissions got selected to present!

July 20th 6:50 PM EDT

Crap, now I actually have to write the entire thing!

July 20th 6:51 PM EDT

Wait, and it’s going to be virtual too. That’s going to make it a bit more of a challenge to present. But I’m up to it!

Sometime in August

I really should get started. Hmm, here’s one of the scripts I want to present.

But honestly, I’m preparing to teach a bunch of cavers and medical students cave rescue, I need to concentrate on that first.

September 5th

I just biked over 100 miles. I’m certainly not working on my presentation THIS weekend.

Later in September

Ok, now I’m going to sit down and really work through this. Here’s a basic outline.

October 1st

Oh wait, it’s going to be virtual AND I have to prerecord it? How is that supposed to work? I had better read up at the speaker portal!

October 2nd

Huh, ok, that sorta makes sense, upload the slides, do a recording, but I still don’t get how it’ll work with a presentation like mine with lots of demos. Well I’ll figure it out.

October 6th around 11 PM EDT

Well the PowerPoint template deck they provided looks pretty slick. I should start prepping my slides.

October 6th, approximately 5 minutes later

There, got the first slide done. Of course it’s only my name and pronouns, etc. But it’s a start.

Oh and the 2nd slide is done, but that’s simply the default PASS slide talking about chapters, SQL Saturday etc, so technically I didn’t do anything there.

I’ll start working on the closing slides.

October 7th, sometime after midnight

Ok, about 5 slides done. I’ll like to myself and say I’ve made great progress!

October 9th, approximately 10:00 PM EDT

Ok, I’ll at least start writing out the scripts I need.

October 9th, 20 minutes later

What the bloody hell? Why is this script failing? I’ve got to present this. If I can’t get this script working how is anyone going to believe that I know PowerShell, let alone actually use it.

October 9th, 5 minutes later

Well, damn, that was an embarrassing mistake, just had the , in the wrong place

October 10th around 9:00 PM EDT

Hmm, to properly demo this, I really need to run against 3-4 SQL Servers and I really don’t want to spin up a bunch of VMS and I can’t use my development one, too much proprietary data there.

I know, NOW is a perfect time to start to learn to use Docker! Why not? And besides Cathrine Wilhemsen has a great post on it. I’ll simply follow that.

2 hours and 1 reboot later

Hey, would you look at that? I’ve actually got a docker container running SQL. This is awesome!

Another minute later

But why can’t I actually connect? What network is it on? Why did I decide docker was easier? Why did I even submit this proposal? What the heck am I doing here? What is the meaning of life?

5 more minutes

That’s it, I’m going to bed.

October 11th, late night

Oh, I get it it now, I didn’t setup a full separate network, it’s bridged and that’s why it’s showing 0.0.0.0. I just need to change the port and I’m good to go!

A minute later

This is pretty awesome. Not what I’d do for a production setup, but definitely works for my demos. Now if I were really smart, I’d also setup persistent storage and the like, but this is good enough. And honestly now, setup a loop, increment a variable and bam, I’ve got 4 instances of SQL running in docker, 2 are 2017 and 2 are 2019. This is really incredible. I’m proud of myself.

Oh and even better, I’m doing all this in a PowerShell script, so I can actually make it PART of my presentation!

October 12th 2:26 PM EDT

Send off an email to the Program folks at PASS asking about how the recording stuff works with demos. Eagerly awaiting a reply.

October 15th, another late night

Yes, there’s a theme here, much of my work is being done late at night. It seems to work for me. But dang that deadline is getting closer!

October 16th, late night, again

Watched some Schitt$ Creek with the family. “Why didn’t we start watching this sooner? It’s hilarious! But I need to work on my presentation some more.”

Get all the PowerShell scripts basically done. I’m happy with it, need to work on my speaking script some.

October 19th 3:00 PM EDT

Get off the phone with a fellow Cave Rescue expert. Just before I get off, I mention my upcoming virtual, prerecorded session I have to finish. He says, “Oh, you know I just did 2-3 of those for a rescue conference, exact same format. It worked out really well. I can send you some details and feedback.”

I find that reassuring.

Also recheck email, still no answer from the folks at PASS on my questions about demos, etc.

October 19th, guess what time

I’ve finished everything, even updated the slides and scripts a bit more. I’m a bit worried I’m going to run too long, but decide to do my first of several practice run throughs.

Do my first full run through. Stop and correct a few mistakes or rough edges here and there. I’m not too worried if I run over now since I know I’ve artificially added some time.

October 19th, 42 minutes later

I get done, look at the PowerPoint timer: 42 minutes. “CRAP! I need this to be 60 minutes!” I’m not too worried, I can add more, but I’m not sure where and I don’t want to simply add fluff for the sake of fluff. I need to give this some thought.

Later on October 19th

Talking to a friend of mine who among other things has a background in adult education. She doesn’t know SQL or PowerShell, but she’s a good sounding board and she’s going to sit through my next run-through, not so much for the technical details but to give feedback on the flow and perhaps suggestions on where I may be making too many assumptions on what my listeners will know.

October 20th Early Morning

It’s a Tuesday, time to blog. As always I face that question, what should I blog about?

“I know, I’ll blog about how I’m getting my presentation together and the deadline is fast approaching. I can’t be the only speaker that often finds themselves up against the deadline and panicking.”

Next 36 hours

Add a bit more content and run through it 2-3 more time and then… RECORD! (technically it looks like I have until the 26th to upload my recording, but I want to get done early).

Conclusion

The above may or may not be a wholly accurate timeline or description of the process I’ve gone through trying to get my presentation ready for Pass Virtual Summit. I may have elided a few details and over-hyped a few others, but in general it’s close to true and accurate. Despite my always best intentions, I find myself often working up close to the deadline for submissions. Since for Summit they want NEW presentations, I can’t simply dust-off one of my previous presentations and use that, so there’s definitely more work involved here.

And honestly up until I learned it was going to be prerecorded, I thought I’d have most of October to work on it. The deadline to get the slides and recordings submitted sort of threw my original timeline for working on it in the dumpster so I’m actually a bit further behind than I expected to be.

On the other hand, I really did learn to use Docker and I think that’s valuable and I am making that part of my presentation. And, when all is said and done, I think I’ll be happy with it. I think though like any good speaker, I’ll look back and think “well next time, I’ll have to improve this or that.” There’s always room for improvement. I’m not keen on giving it prerecorded. I value the instantaneous feedback I get from the audience. So that will be different. But I at least can elicit questions during the presentation and there’s a life Q&A afterwards. But, I’ll still be nervous.

I’m in awe of speakers who get their presentations all prepped and prepared months in advance, but I suspect there’s a number out there like me, that don’t operate that way. And I suspect there’s a few who are even more nervous than I thinking, “OMG, am I the only one in this spot?” Nope, you’re not. Or rather, “Please let me know I’m not the only one!”

See you all at Summit, at least virtually!

And in the meantime there’s another possible deadline coming up I need to think about…

#SQLFamily

I’ve mentioned this in the past and thought I’d write something quick about it today. The quick is because I’m lacking time, not because the topic isn’t important or worthy of exposition.

Anyone who has spent much time at any PASS events such as SQL Saturday or Pass Summit has an inkling of what #SQLFamily is.

At its base, it’s a group of professionals who all have SQL Server in common. That might be a start, but it’s hardly a good definition. It’s also:

  • Professional contacts
  • LOTS of people willing to give SQL help when you need to solve a problem
  • Folks that will fact check your blog or post
  • It’s the folks willing to step up for a User Group Meeting talk

And that might be enough, but that’s not all it’s also:

  • Someone who loves bicycling as much or more than me.
  • At least one amateur radio operator (and quite the ham in other ways at times)
  • Several with 3D printers making mask band holders and the like
  • Several that sing karaoke
  • Someone who makes more homemade pizza than I do
  • At least one with cute puppies she’s been known to have on her webinars

But, honestly, it doesn’t end there. In this time of Covid it’s been more.

  • It’s been the folks who I get together with on Friday’s for a long-distance social hour
  • It’s the ones I’ve been able to talk about fears of COVID and schooling and kids
  • It’s been the ones I’ve reached out to to make sure they’re ok
  • It’s been the ones that have checked in on me
  • It’s the folks that write blog posts, sometimes daily, about how to support others

In short it really is a family. We’re not together by blood but we still share our thoughts and feelings and support each other. And you know what, right now I’m extremely grateful for that family.

So to my #SQLFamily, if I haven’t said it enough, thank you for who you are and for being there, especially during this time of Covid. I know I’ve needed it. And I really appreciate it.

And I can’t wait to see you all in person again at some point.

P.S. – if you’re shy or don’t think you’re welcome in the family, don’t worry, you are welcome. Pop in, say hi, or even just reach out to one person and say hi or ask for an introduction.

The Value of Testing

This is one of those posts where I wish I could show actual code snippets, but since it involves a 3rd party vendor for one of my clients and I don’t have permission, I can’t.

So, I’m forced unfortunately to talk about the issue in a roundabout way.

My client uses a 3rd party tool to track documents. I’ve mentioned this before. They’ve been growing fairly fast and running into performance issues. I suppose growing fast is a good thing, but having performance issues is not.

In any case, using Query Store, I was able to send the vendor a list of queries and stats about them for them to review and to ideally improve the queries that needed work.

Yesterday they got back to me. The email was essentially we took this first query (let’s call it Doubly-Joined) and rewrote it as this second query (let’s call it Singly-Joined). I looked at the two queries, which join 4 tables. They’re very similar to each other, but the first one did join in the main table a second time (hence why I’m calling it Doubly-Joined). It’s not clear why this was done. The second query basically removed the second join and in the select clause, changed the aliases to the second join to the first join. This does give them a slightly different query plan, but ultimately, they return the same number of rows.

The first query plan
The second query plan

As you can see, the 2nd query plan is definitely a bit simpler (ignore the one warning, it’s not something that appears to be fixable here).

So, a naive take would be “we removed an unnecessary join, so of course it should be faster!” But is it?

Sometimes intuition can be correct, sometimes not so much. In this case though, it’s easy to confirm by seeing exactly how many rows are being read in each query.

I wrapped each query in a

Set Statistics IO ON/OFF
Set Statistics TIME ON/OFF

block and ran it. Here are the results

The Doubly-Joined

Table 'Table1'. Scan count 0, logical reads 337264, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Table2'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Table3'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Table4'. Scan count 1, logical reads 396, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The Singly-joined

Table 'Table1'. Scan count 0, logical reads 337260, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Table2'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Table3'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Table4'. Scan count 1, logical reads 396, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

I’ve highlighted the relevant change. The single-joined query consistently performed with 4 fewer logical reads. Now, if the original number had been 8 and had dropped in half to 4, I’d be happy. But the change from 337264 to 337260 leaves me a bit underwhelmed. Furthermore, under multiple runs, the second query did not consistently use less CPU time, sometimes it took faster to run. Further testing was consistent in the lack of apparent improvement.

Needless to say, I don’t think this query improvement will help much. I’ve reached out to the vendor to see if they can provide more details, but honestly, I’m not hoping for much.

Query Store Saves the Day

It’s never a good thing when you get an impromptu meeting invite on the weekend and the subject line is “Sync Error”. I honestly didn’t even see the invite until the meeting had been going on for over an hour.

I called in and was brought up to speed. A 3rd party tool one of my client uses was having major timeout issues. Normally it’s fine, but my client was taking advantage of the weekend to do a very large import of data and the tool wasn’t keeping up.

I both love and hate being thrown into situations like this. I hate it because often I have very little information to go on, but also love it, because it can be a good challenge. So, I wanted to collect some data. Fortunately the database in question runs on SQL Server 2016. This blog post covers a bit of what we did and ends with why I am so grateful for Query Store.

Query Store and the First Graphs

I quickly enabled Query Store and grabbed a quick report. Based on help with the 3rd party support, I was able to focus on a particular query.

Query Store first graph

Initial Query Store screen grab

So, right away, I knew that at times this query could flip to a pretty bad query plan. I was curious as to why. But while poking around, I noticed something else going on. The database was at the SQL Server 2008 compatibility level, despite running on SQL Server 2016. Now I know when we upgraded the server a year ago the 3rd party vendor didn’t guarantee compatibility with 2016, so we had left it in its old compatibility level. Since then apparently the vendor had qualified it and I confirmed with their support who was on the line that I could change the compatibility level to SQL Server 2016. Of course, I wanted to see if this would make a difference, so I grabbed another one of the problematic queries and looked at the query plan both before and after.

Compatibility level 100

Query Plan at SQL Server 2008 Level

Compatibility level 130

Query Plan at SQL Server 2016 Level

As you’ll note, the 2008 plan uses 2 hash matches, the 2016 uses two merge joins. That’s interesting by itself, but after collecting a bit of data, I saw the 2016 plan was running in an average of 45ms. The 2008 plan had been averaging 1434ms. That’s quite the improvement, simply by a single change!

That said, I still wasn’t entirely comfortable with what was going on and dug a bit deeper.

Digging Deeper

The change to the compatibility level had essentially eliminated the green bar in the above graph. This was good. But the blue bar to the left of it was still an issue. It also had a similar issue with flipping between two different query plans, but this was even worse.

Query Store second graph

Better, but not that one query really stands out!

I find this particular chart to be the most useful. I set a custom time frame (in this case 3 hours) and looked at the total duration of 25 queries that had accumulated the most time running. It’s pretty clear that one query dominates and working on this is probably where I want to spend my efforts. It’s also very hard to pick out, but the query (#12) from the first graph that I had looked at, has improved so much that it’s now moved to 12th on the list from the 2nd position.  That’s quite an improvement and simply by changing the compatibility level! More on my thoughts on that below.

The more I thought about it, the more I started to focus on statistics. This was an educated guess based on the fact that my client was doing a LOT of inserts and updates into a particular table. There’s another issue I’ll also discuss. This one I couldn’t fix unfortunately, but if the 3rd party can, I think they’ll see a HUGE improvement in performance.

Slow query plan

Slow version of the query

Fast version of the query

Fast version of the query

These look VERY similar, except the position of the Key lookups and the Index Seeks are swapped. That may not seem like much but the slow version was on average taking about 93.95 ms and the fast version was on average taking about .11ms. That’s a HUGE difference, about 850x difference! It took me a bit to realize what was going on, but let’s talk about that Key lookup. Even with the faster version, it’s obvious that if I can eliminate that, I could get things to be even faster!  The problem is that the query wants to return some columns not covered in the IX_FileID index. That’s generally easy to fix and while I’m loathe to make updates to 3rd party packages, I was willing to test this one out by making it a covering index. Unfortunately, this is where I was stymied. One of the columns is an IMAGE datatype and you can’t throw those into an index. I’ve recommended to the 3rd party vendor they try to change this. It wouldn’t be easy, but it could have dramatic performance improvements here and elsewhere (I had run into this problem last year while trying to tackle another performance issue).

I should note, that even though this query is actually very fast, it is executed so much that its total time dominates in the system. This is one reason why any improvement here would have a dramatic impact.

Statistics

In any case, looking at these two query plans and doing some further testing confirmed my hypothesis and also suggests why changing the compatibility level helped so much: statistics were very quickly getting out of whack.

I was able to confirm this by grabbing some data from the query store for just the last hour and it showed only the slow version of the query was running. I then forced an update of stats on the table in question and immediately saw the query flip over to the faster plan. This continued for awhile before it flipped back to the slower version.

We developed a plan, which I’ll admit upfront didn’t work. We decided that updating the stats on that particular table every hour might give us tremendous performance gains. And in fact it did initially. BUT, what we found was that after an hour of inserts, running the update stats for that table took about 45-60 seconds and the vendors tool has a hard-coded 30 second timeout. And because of the way this particular tool works, it means after a failure you have to start from scratch on every run. Since the job can take 4-6 hours to run, we couldn’t update stats every hour, even though I would have liked to.

Query Store third graph

The graph that should our plan wasn’t working

Above shows how at the time the update stats was running (that particular column of the query story graphic is cut off) the query times jumped to 30 seconds.  So while overall updating the stats is a good thing, here it was definitely killing our process.

Above I mentioned that changing the compatibility level still had an impact here. What I didn’t show here was that I was also looking at a bunch of statistics histograms and could see how badly things had gotten in some cases. But this is an area where SQL Server 2016 makes a difference! It can do more in the background better to help keep statistics a bit more accurate (still not as good as a full update, but it can dramatically help.) This is a hug part I believe of why the first query addressed above improved AND stayed improved.

Loving Query Store

They say a picture is worth a 1000 words. Honestly, I probably could have figured out the above issues with running a bunch of queries, looking at some DMVs, statistics histograms and the like. But it would have taken longer. Note too you can query the query store. But, the ability to instantly look at a graph, see what’s taking the most time, or executing the most, or a variety of other parameters makes the graphical interface to Query store EXTREMELY valuable. I was able to instantly zero in on a couple of key queries and focus my energies there. By varying the timeframes I was looking at, I could try changes and see the impact within minutes. I could also look at the stored query plans and make judgments based on what they showed.

If you’re NOT using Query Store to debug performance issues, start doing it. To be honest, I haven’t used it much. I wouldn’t call myself an expert in it by any means. But, I was able to pull it up and almost instantly have insight into my client’s issues and was able to make actionable suggestions.

And to quote the product manager there after I fixed the first query simply by changing compatibility mode, “A good DBA is like having a good mechanic to work on your car.” That one made me smile.

Oh and I’ve been known to swap out the alternator on my old Subaru in under 10 minutes and have replaced the brakes a number of times. So if this DBA thing doesn’t work out, I guess I’ve got another career I can look into!

Final Note

Per my NDA, I obviously haven’t named my client. But also, simply out of respect, I haven’t named the third party tool. I don’t want folks thinking I’m trying to besmirch their name. Their product is a fine one and I’d recommend it if asked. But my client is one of their larger users and sometimes pushes it to the limits so we sometimes find some of the edge cases. So nothing here is meant to disparage the 3rd party tool in an way (though they should replace that image field since it really doesn’t need to be one!)

 

 

Trust but Verify

This is one of those posts where you’ll just have to trust me. Honestly.

I want to talk about indexes.

About a week ago, a friend on a chat system I use mentioned how one of their colleagues had mentioned, “oh, we don’t have to optimize the database, the server is fast enough” or words to those effect. All of us in the discussion blanched a bit. Yes, when I started in the business a 10GB database was considered large and because of the memory limit with 32-bit SQL, we were limited to 2GB (or 3GB if you took the right steps) of memory so it was literally impossible to keep a large database in memory. Of course now we routinely deal with databases 100s of GB in size with machines that can easily have .5TB of memory or more. This means except for writes, an entire database can easily be kept in memory.

But that said, optimization still matters. Last week I was debugging an ETL process that I’ve helped a client with. I’d love to show screen shots, but my NDA won’t allow me (hence my asking you to trust me). Ok, that’s partly a lie. I couldn’t provide too many details if I wanted to, but the bigger issue is, I’ve since closed the windows I that showed the scripts in questions and the results of my changes.

One of the last things each step in the ETL does is write back to the source table an updated Sales Force id. It’s actually a bit more complicated because what it really does is write to either a Success table or an Error table and depending on a factor or two, a trigger will then update the source table. I had previously debugged and improved the performance of the trigger. But something was still bothering me about the performance. I looked a bit deeper and one of the things that trigger does if there’s a success is make sure to remove the row from the Error table. This was taking longer than I suspected it should, so I dug into it and I noticed that the Error table had no index.  

I can’t show the original queries I used, but I can show an example of the impact of adding a simple clustered index. (See, you can’t even trust me to say I won’t show any examples! You’d better read the entire post to verify what I’m really writing!)

Here’s an example query (with some changes to hide client specific data)

select * from ErrorTable where SF__External_Id__c='005A000022IouWqIAX'

It’s a very simple query (and simpler than the actual one I was dealing with) but is enough to show the value of a proper index.

Now, in my original query, the Query Tuning Advisor actually suggested an index on SF__External_ID__c. In the example above it didn’t. There’s a canard among many DBAs that the QTA is generally useless and often it is, though I think it’s gotten better. As a consultant, I can often come into a new client and can tell when someone has gone crazy with the QTA and adopted EVERY SINGLE suggestion. In other words, they trusted it, but they never verified it. Why is this a problem? Well at times the QTA can be overly aggressive in my experience, suggesting indices that really provide little benefit, or if you add an index in response to a select query that is run say once a day, but where there are 1000s of updates a day, you might actually slow down your updates (since now the update also has to update the index). And as mentioned above, sometimes it might fail to suggest an index. (I think in this case, it didn’t suggest one on my example because the size of the underlying table was far smaller than before).

So, I like to verify that the index I’ll add will make a difference. In cases like this, I often go old school and simply bracket my test queries

set statistics IO ON
set statistics Time ON
select * from ErrorTable where SF__External_Id__c='005A000022IouWqIAX'
set statistics IO OFF
set statistics Time OFF

And then I enable Actual Execution Plan.

The results I received without any sort of index are below. Some key numbers are highlighted in red.

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 47 ms, elapsed time = 63 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(2 rows affected)
Table 'ErrorTable'. Scan count 1, logical reads 3570, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 15 ms.
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

You’ll notice the physical reads are 0. This is nice. This means everything is in memory.

In this case, because I’m familiar with how the ErrorTable is accessed I decided a clustered index on SF__External_Id__c would be ideal. (all my updates, inserts, deletes, and selects use that to access this table).

I added the index and my reran the query:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(2 rows affected)
Table 'ErrorTable'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Note the number of logical reads dropped by about a factor of 1000. My elapsed time dropped from 15 ms to 0 ms (or rather less than .05 ms so SQL Server rounded down).

If we look at the graphical query plan results we something similar:

First, without the index:

Trust_but_Verify_Query Table Scan

Table scan to find 2 rows

Trust_but_Verify_Query Table Seek

Table Seek to find 2 rows

That’s nice, I now know I’m doing a seek rather than a scan, but is that enough? I mean if the ErrorTable only has 2 rows, a seek is exactly the same as a scan!

So let’s dig deeper:

Trust_but_Verify_Query Table Scan Details

Query plan showing details for a scan

Trust_but_Verify_Query Table Seek Details

Query plan showing details for a seek

Here you can definitely see the dramatic improvement. Instead of reading in over 100,00 rows (at a bit over 2.5 KB per row, or over 270MB) we only need to read in 2 rows, for a total of just over 5 KB of data.

Now wonder it’s faster. In fact, in the ETL process where it was originally taking about 1 minute to process 1000 rows, my query with the index was now executing 3000 rows in under 10 seconds.

The above is a bit of a contrived example, but it’s based on actual performance tuning I did last week. And this isn’t meant to be a lesson in actual performance tuning, but more to show that if you make a chance (in this case adding an index) you can’t just trust it will work, but you should VERIFY that it has made a difference, and more importantly, that it makes a difference for your workload. I’ve seen GTA often make valid, but useless index suggestions because someone ran an uncommonly used query against it and assumed the recommendation was good. Or, they’ve made assumptions about the size of the table.

So never just trust an index will help, but actually VERIFY it will help.