An Ounce of Prevention?

There’s an old saying in medicine, “when you hear hoof beats, think horses, not zebras.”  Contrary to what one might think after watching House the truth is, when you get presented a set of symptoms, you start with the most likely, well because it is most likely! But as House illustrates, sometimes it can be the unlikely.

In First Aid, especially wilderness or backcountry medicine, there’s an acronym that is often used called SAMPLE. This is a mnemonic to help rescuers remember what data to gather:

  • S – Signs/Symptoms – What do you see, observe? (i.e. what’s going on).
  • A – Allergies – Perhaps the problem is an allergic reaction, or they might be allergic to whatever drug you want to give them.
  • M – Medicines – What medications are they on? Perhaps they’re diabetic and haven’t taken their insulin, or they’re on an anti-seizure medicine and need some.
  • – Past, pertinent medical history. You don’t care they broke their ankle when they were 5. But perhaps they just underwent surgery a few weeks ago? Or perhaps they have a history of dislocating their shoulder.
  • L – Last oral intake. Have they eaten or drunk anything recently. This will drive your decision tree in a number of ways.
  • E – Events leading up to the injury.  Were they climbing to the top of the cliff and fell, or did they simply collapse at the bottom? The former may suggest you look for a possible spinal injury, the latter probably indicates something else.

I’m reminded of this because of how I spent my Presidents’ Day.  I woke up and checked a few emails and noticed that two I were expecting from a client’s servers never arrived. I logged in to see what was going on.  It turns out there was nothing going on. No, not as in, “nothing wrong going on” but more as in “nothing at all was happening, the databases weren’t operating right.”  The alert system could connect to the database server, so no alerts had been sent, but actually accessing several of the databases, including the master resulted in errors.

Master.mdf corruption

Not an error you want to wake up to!

And what was worse, was the DR server was exhibiting similar symptoms!

So modifying SAMPLE a bit:

  • S – Signs/Symptoms – Well, databases are throwing corruption errors on two servers.  This was extended to the ERRORLOG files on both servers.
  • A – Allergies – Well, servers don’t have allergies, but how about known bugs? That’s close enough.  Nope, nothing that seems to apply here.
  • M – Medicines – I’ll call this antivirus software and <redacted>. (For client privacy reasons I can’t specify the other piece of software I want to specify, but I’ll come back to it.)
  • – Past, pertinent medical history.  Nothing, these servers had been running great up until now. One has been in production for over 2 years, the other, up for about 2 months, being brought up as a DR box for the first.
  • L – Last oral intake. Let’s make this last data intake. Due to forensics, we determined the corruption on both servers occurred around 3:00 AM EST.  Checking our logs, jobs, and other processes, there’s nothing special about the data the primary server took in at this time. If anything, disk I/O was lower than average.  And, fortunately, we can easily recreate any data that was sent to the server after the failure.
  • E – Events leading up to the injury. This is where things get interesting.
    • There were some zero-day patches applied to both servers over the weekend.
    • On Saturday, I had finally setup log-shipping between the two servers

So, we’ve got three possibilities, well four really.

  1. The zero day patches caused an issue about 48 hours later.  Possible, but unlikely, given the client has about 1600 servers that were also patched and have not had issues.
  2. Log-shipping somehow caused problems.  But again, the new log-shipping setup had run for about 36 hours without issue. And, best we can tell, the corruption occurred on the secondary BEFORE the primary. And log-shipping doesn’t apply to the Master database or the ERRORLOG file.
  3. Some unknown interaction.  This I think is the most likely and is where the <redacted> from the M above comes into play.
  4. Pure Random, and it’ll never happen again. I hate this option because it just leaves me awake at night.  This I added only for completeness.

Without going into detail, our current theory is that some weird interaction between <redacted> and log-shipping is our cause. Of course the vendor of <redacted> is going to deny this (and has) but it’s the only combination of factors that seems to explain everything. (I’ve left out a number of additional details that also helped us get to this conclusion).

So for now, we’ve disabled log-shipping and are going to make some changes to the environment before we try log-shipping again.

Normally I think horses, but we might have a herd of zebras on this one. And ironically, setting up for DR, may have actually caused a Sev 1 outage. So the ounce of prevention here may not have been worth it!

And who said my Wilderness First Aid wouldn’t come in handy?

 

 

Them’s the brakes…

I need brake work done on my car.  I scheduled an appointment with my mechanic for Thursday.

Now normally brake work might not be worth writing about, but this time it got me thinking. About 9 years ago I taught myself to change the brakes on my Subaru. It wasn’t that hard, but it definitely took longer than I would have liked. I’ve learned how to do it much faster since then.

Back then I did it because I was between jobs and had more time than money. It made sense at the time. Since then I’ve generally continued to do them myself. It doesn’t take long and as I’ve said in the past, sometimes I like working with my hands. It reminds me not all of life is simply databases and servers.

So why this time? The simple answer is, my garage floor is cold!  Ayup, it’s a funny thing, but in the winter, things get cold! And honestly, I just don’t feel like sitting/laying on the concrete while I do the work. So, I’ll pay an expert who can pop it up on a lift and do the work in probably half the time I can.  I remind myself it’s why I work, so I can pay others to do work I don’t care to do, or that they can do better than I can.

Years back, at my last job as the Director, and later VP of IT, I changed our email provider from one company to another (don’t ask me who they were, I can’t recall over a decade later).  My team didn’t like this. They kept insisting they could run the mail servers themselves. I had no doubt that they could. Running an Exchange server isn’t the most difficult thing in the world. That said, I kept saying no.  For a simple reason: “we weren’t an email company!” Providing reliable email for a business like we were meant at the very least strong spam protection and of course high-availability with ideally geographic redundancy. This meant at least two Exchange servers, spam filters and more. This alone would have cost more than what we paid annually. Now add in the time my team would have spent on email issues, it just wasn’t going to be worth it.

And, we’re seeing that more and more with services such as Azure. Sure, many businesses run change their own brakes, err, run their own SQL Server. But more and more are outsourcing to platforms such as Azure. But there are still a number of companies that for various reasons don’t do that. Fortunately for them, there are consultants like me to help them with their servers. SQL Server has always been easier to maintain than some of its competition, especially when it was first available. But that has never meant it needed no intervention. Someone still needs to make sure that jobs are being run, that backups are available and more.

So, some days I change my own brakes because it’s fun and easy, some days I pay someone.  Some days my clients handle their own SQL Server issues, and other days, they pay me.

No real revelation here. No real advice on when to draw the line on outsourcing, just an observation. But in my case, my concrete is too cold, so I’ll pay someone else to do the dirty work.

How Much We Know

Last night I had the privilege of introducing Grant Fritchey  as our speaker to our local user group. He works for Redgate who was a sponsor. The topic was on 10 Steps Towards Global Data Compliance.  Between that and a discussion I had with several members during the informal food portion of our meeting I was reminded me of something that’s been on my mind for awhile.

As I’ve mentioned in the past, I’ve worked with SQL Server since the 4.21a days. In other words, I’ve worked with SQL Server for a very long time. As a result, I recall when SQL Server was just a database engine. There was a lot to it, but I think it was safe to say that one could justifiably consider themselves an expert in it with a sufficient amount of effort. And as a DBA, our jobs were fairly simple: tune a query here, setup an index update job there, do a restore from backups once in awhile. It wasn’t hard but there was definitely enough to keep a DBA busy.

But, things have changed.  Yes, I still get called upon to tune a query now and then. Perhaps I making sure stats are updated instead of rerunning an index rebuild, and I still get called upon to restore a database now and then. But, now my job includes so much more. Yesterday I was writing a PowerShell script for a client. This script calls an SFTP server, downloads a file, unzips it and then calls a DTSX package to load it into the database.  So now I’m expected to know enough PowerShell to get around. I need to know enough SSIS to write some simple ETL packages. And the reason I was rewriting the PowerShell script was to make it more robust and easier to deploy so that when I build out the DR box for this client, I can more easily drop it in place and maintain it going forward.  Oh, did I mention that we’re looking at setting up an Availability Group using an asynchronous replica in a different data center? And I should mention before we even build that out, I need to consult with the VMWare team to get a couple of quick and dirty VMs setup so I can do some testing.

And that was just Monday.  Today with another client I need to check out the latest build of their application, deploy a new stored procedure, and go over testing it with their main user. Oh, and call another potential client about some possible work with them. And tomorrow, I’ll be putting the finishing touches on another PowerShell article.

So what does this have to do with last night’s meeting on Global Data Compliance? Grant made a point that in a sense Data Compliance (global or otherwise) is a business problem. But guess who will get charged with solving it, or at least portions of it?  Us DBAs.

As I started out saying, years ago it was relatively easy to be an expert in SQL Server. It was basically a single product and the lines tended to be fairly distinct and well drawn between it and other work. Today though, it’s no longer just a database engine. Microsoft correctly calls it a data platform.  Even PASS has gone from being an acronym for Professional Association of SQL Server to simply PASS.

Oh, there are still definitely experts in specific areas of of the Microsoft Data Platform, but I’d say they’re probably more rare now than before.  Many of us are generalists.

I mentioned above too that I’d probably be more likely to update stats than an index these days.  And while I still deal with backups, even just the change to having compression has made that less onerous as I worry less about disk space, network speed and the like. In many ways, the more mundane tasks of SQL Server have become automated or at least simpler and take up less of my time. But that’s not a problem for me, I’m busier than ever.

So, long gone are the days where knowing how to install SQL Server and run a few queries is sufficient. If one wants to work in the data platform, one has to up their game. And personally, I think that’s a good thing. What do you think? How has your job changed over the past decade or more. I’d love to hear your input.

Crossing the Threshold…

So it’s the usual story. You need to upgrade a machine, the IT group says, “no problem, we can virtualize it, it’ll be better! Don’t worry, yes, it’ll be fewer CPUs, but they’ll be much faster!”

So, you move forward with the upgrade. Twenty-three meetings later, 3 late nights, one OS upgrade, and two new machines forming one new cluster, you’re good. Things go live.  And then Monday happens. Monday of course is the first full day of business and just so happens to be the busiest day of the week.

Users are complaining. You look at the CPU and it’s hitting 100% routinely. Things are NOT distinctly better.

You look at the CPUs and you notices something striking:

cpu not being used

CPU 8 is showing a problem

4 of the CPUs (several are missing on this graphic) are showing virtually no utilization while the other  8 are going like gang-busters.  Then it hits you, the way the IT group setup the virtual CPUs was not what you needed.  They setup 6 sockets with 2 cores each for a total of 12 cores. This shouldn’t be a problem except that SQL Server Standard Edition uses the lower of either 4 sockets or 24 cores. Because your VM has 6 sockets, SQL Server refuses to use two of them.

You confirm the problem by running the following query:

SELECT scheduler_id, cpu_id, status, is_online FROM sys.dm_os_schedulers

This shows only 8 of your 12 CPUs are marked visible_online.

This is fortunately an easy fix.  A quick outage and your VM is reconfigured to 2 sockets with 6 cores a piece. Your CPU graphs now look like:

better CPU

A better CPU distribution

This is closer to what you want to see, but of course since you’re doing your work at night, you’re not seeing a full load. But you’re happier.

Then Monday happens again.  Things are better, but you’re still not happy. The CPUs are running on average at about 80% utilization. This is definitely better than 100%. But your client’s product manager knows they’ll need more processing power in coming months and running at 80% doesn’t give you much growth potential. The product manager would rather not have to buy more licenses.

So, you go to work. And since I’m tired of writing in the 2nd person, I’ll start writing in 1st person moving forward.

There’s a lot of ways to approach a problem like this, but often when I see heavy CPU usage, I want to see what sort of wait stats I’m dealing with. It may not always give me the best answer, but I find them useful.

Here’s the results of one quick query.

Fortunately, this being a new box, it was running SQL Server 2016 with the latest version service pack and CU.  This mean that I had some more useful data.

CXPackets

CXPackets and CXConsumer telling the tale

Note one of the suggestions: Changing the default Cost Threshold for Parallelism based on observed query cost for your entire workload.

Given the load I had observed, I guessed the Cost Threshold was way too low. It was in fact set to 10.  With that during testing I saw a CPU graph that looked like this:

43 percent CPU

43.5% at Cost Threshold of 10

I decided to change the Cost Threshold to 100 and the graph quickly became:

25 percent CPU

25% at Cost Threshold of 100

Dropping from 43.5% to 25.6%. That’s a savings you can take to the bank!

Of course that could have been a fluke, so I ran several 5 minute snapshots where I would set the threshold to 10, collect some data and then to 100 for 5 minutes and collect data.

CXPacket_10      CXPacket_10_Waittime_MS
635533 5611743
684578 4093190
674500 4428671
CXConsumer_10              CXConsumer_10_Waittime_MS
563830 3551016
595943 2661527
588635 2853673
CXPacket_100   CXPacket_100_Waittime_MS
0 0
41 22
1159 8156
CXConsumer_100            CXConsumer_100_Waittime_MS
0 0
13 29443
847 4328

You can see that over 3 runs the difference between having a threshold of 10 versus 100 made a dramatic difference in the total time spent waiting in the 5 minute window.

The other setting that can play a role in how parallelization can impact performance is MAXDOP. In this case testing didn’t show any real performance differences with changing that value.

At the end of the day though, I call this a good day. A few hours of my consulting time saved the client $1,000s of going down the wrong and expensive road of adding more CPUs and SQL licenses. There’s still room for improvement, but going from a box where only 8 of the 12 CPUs were being used and were running at 100% to a box where the average CPU usage is close to 25% is a good start.

What’s your tuning success story?

52

52 is an interesting number.  It’s the number of weeks in the year. It’s the number of cards in a deck. It’s a number of Earths in the DC Multiverse. It’s an untouchable number, something I just learned. It’s the atomic number of tellurium. In fact it has a number of interesting trivia associated with it according to Wikipedia: 52.

It also just happens to be the number of times I’ve been around the Sun, though strictly speaking that depends if you’re counting sidereal or the tropical year and the fact that I was born at night. But I think we’re close enough.

And it just so happens my birthday falls on the day I usually blog. So rather than something technical (though if I can get permission from a client, I may have something fun and technical soon) I thought I’d post some reflections and thoughts.

For me birthdays are both interesting and boring. I’m glad I’ve reached another milestone. But honestly, after age 25 when my car insurance rates went down, I haven’t given individual birthdays much thought. That’s not strictly true, I sometimes think about the fact that I’ve passed the point where statistically I’m looking at fewer days ahead of me than behind me.

I grew up in a small town, Falls Village CT, and parts of me never have left it. When I stop to daydream, my thoughts take me to the town green where we often played, or the woods behind my dad’s house, or the sand quarry behind the depot I grew up in. It was a safe and quiet life. I watched the world move from bell-bottoms to Reagan power ties.

While just a teenager, I and friends ran not one, but two Monopoly marathons to raise money for the Muscular Dystrophy Association. The first year we played for 100 hours (in teams) and the second 150 hours. I was and am still quite proud of the organization that took and the money we raised.

Since then I’ve done a lot.  I got thinking about that last night at the Capital Area SQL Server User Group meeting. I’m proud to lead this group.  I really enjoy, as I’ve noted before, giving back to the community that has helped me so much.

I’m proud to be a Regional Coordinator for the National Cave Rescue Commission. I can literally say the work the NCRC does saves lives. It’s an honor and humbling when folks come up to me and tell me how their training has made a difference in the lives.

I’m proud of much of what I’ve done in as my avocations and vocations, even if at times I’m often a victim of imposter syndrome. There’s still many times when someone will ask me a question and my first thought is, “why are they asking me, I’m just a kid and… oh wait… no I am the expert here and I’m far from being a kid.”  This is especially true when people I look up turn around and ask me for advice.  This happens a lot in the SQL world.

I’m very proud of my family, especially my son and daughter who are growing up to be wonderful adults, capable of critical and deep thinking. They will make an impact on the world and I don’t think as a parent I could want for anything more.

And of course proud of my wife, but I can’t take credit there, she’s a wonderful person in her own right. I just married well.

One common thread I’ve realized in my life that I enjoy is teaching and sharing my knowledge. I also, as anyone knows me, love a good debate.  Bring your facts to the table. Teach me something, change my mind, or be willing to have yours changed. I still recall a debate I had with someone once about a detail of the Constitution. She made one claim, I made another. We finally settled it by finding a copy of the text and realizing who was right. Afterwards there was no rancor or hurt. We both had appreciated the intellectual exercise and the correction of fact.  But even opinions can at times be changed. At Summit I had two pieces of white chocolate from New Zealand. They changed my mind about white chocolate! I enjoyed them.

People often ask me what I want for my birthday and the truth is, I rarely want material things. Honestly, unless it’s a new Tesla (and NOT the truck) I can and will probably buy it for myself.

But here goes:

  • Another 52 years – hey, why not? We’re making medical breakthroughs, and it’s possible I’m wrong and I’ve got more days ahead of me than behind me. Right now, I’d love that.
  • Learn something – challenge yourself in the next year to learn a new skill or a new topic. Don’t get stuck doing the same things all the time.  Earlier this year I finally took the time to start learning about Extended Events. Who knows what I’ll learn in this coming year.
  • Teach someone something – everyone one of you has a skill someone else doesn’t. Share.
  • Related to that: if you’re a caver, or have a friend or family member who is a caver, get them to take the 2020 National Cave Rescue Seminar.
  • Have a friendly debate with someone. Realize it’s not about winning or losing, but an exchange of ideas. Bring your facts to the table and recognize your opinions. Be open-minded.  Be prepared to say, “You know what, you’re right, I was wrong.” This is not losing a debate.  And be prepared to acknowledge someone saying the above to you. Accept the words graciously, don’t lord it over them. This is not about winning a debate.
  • Be kind.  If nothing else in the coming year, be kind.

And that’s it for turning 52.

P.S. Unrelated, but check out my latest article at Redgate’s Simple-talk: Building a Countdown Timer with PowerShell

Kids, get off my lawn!

Change can be hard. But sometimes it’s necessary. And a lot has happened this week.  First, I want to congratulate my fellow #SQLFamily member Cathrine Wihlemsen on one more orbit of the Sun. Apparently, in her honor Microsoft decided to release SQL Server 2019 on her birthday! I’ve been using SQL Server since the 4.21a days. Every version has had new features and required learning something new. As I said recently, it’s easy to fall into the trap of being an old dog and not learning new tricks. This is something we have to avoid. Being trapped in the past can be limiting.

Besides SQL Server 2019 dropping this week, I recently upgraded my phone. I had been using a Windows Phone for about 5 years now. I loved it. Especially when it first came out, it was top of the line and had a bright future. I eagerly downloaded apps and it became part of my life. But alas, we know how well Microsoft did in the Windows Phone market. But I doggedly held on, even as features were deprecated. I couldn’t use the Weather App. The Amtrak App went away. Eventually several features of Cortana stopped work as Microsoft stopped supporting them. Slowly my phone was becoming a brick. I kept debating do I upgrade to one more Windows Phone knowing it’s the end of the line, or what? I kept putting off the decision. After the mapping function failed me on my recent trip to the Hampton Roads User Group Meeting I decided it was time to finally time to replace it with an Android phone. Choosing from the plethora out there was not fun. It was very tempting to go with one of the top of the line models, but spending $1000 or so wasn’t really a fun idea.  I eventually ended up choosing a Samsung A50.

I’m mostly happy with it. Right now I’m struggling with what parts of it are “get off my lawn” because I don’t like change, and what parts are “what the hell is the UI doing now?”  Fortunately, my son has mentioned some of his dislike of certain UI functionalities, so I think not all of it is me simply being an old curmudgeon (are there young ones?) I will say what I’m most happy with is that Microsoft has a number of tools including the Windows Launcher and the Phone Companion, as well as the obvious apps like Outlook and other parts office.

A word about the Phone Companion. This alone has made the upgrade a win. One of the features is that when I’m working at home (I have not yet enabled it on my Surface Pro) is that things like text messages pop-up on my desktop screen. This actually makes life a LOT easier, since I can simply type a reply from a full-size keyboard or copy the numerous soft-tokens I get to log into various client sites without having to pick up my phone. It’s a small detail, but a wonderful one!

The Launcher helps me retain some of the features that I liked about my Windows Phone. Overall, it’s a win.

But the changes in my life aren’t complete. As I mentioned last week I’m at PASS Summit again this week in Seattle. But alas, this is the last year that PASS Summit will be in Seattle. Next year it will be held in Houston. Just as I’ve figured out where the cheapest and most convenient parking for me is, where some decent food places are, and I’m feeling, if not at home in Seattle, at least comfortable, next year is a big change. I won’t be able to stay with my college friends or do our annual Thai pot luck with a bunch of ROC Alumns.

But, I’ll get to explore another city. I’ve been to Houston only once, literally decades ago, to do SQL Server install at Exxon. The server was literally the only Intel computer in a room full of mainframe equipment. I suspect that has changed since then.  That was one of my early experiences installing SQL Server (4.21a for the record).

So, this old dog is still learning and looking forward to new experiences: plus ça change, plus c’est la même chose.

 

Small Victories

Ask most DBAs and they’ll probably tell you they’re not a huge fan of triggers.  They can be useful, but hard to debug.  Events last week reminded me of that. Fortunately a little debugging made a huge difference.

Let me set the scene, but unfortunately since this was work for a client, I can’t really use many screenshots. (or rather to do so would take far too long to sanitize them in the time I allocate to write my weekly blog posts.)

The gist is, my client is working on a process to take data from one system and insert it into their Salesforce system.  To do so, we’re using a 3rd party tool called Pentaho. It’s similar to SSIS in some ways, but based on Java.

Anyway, the process I was debugging was fairly simple. Take account information from the source and upsert it into Salesforce. If the account already existed in Salesforce, great, simply perform an update. If it’s new data, perform an insert.  At the end of the process Pentaho returns a record that contains the original account information and the Salesforce ID.

So far so good. Now, the original author of the system had setup a trigger so when these records are returned it can update the original source account record with the Salesforce ID if it didn’t exist previously. I should note that updating the accounts is just one of many possible transformations the entire process runs.

After working on the Pentaho ETL (extract, transform, load) for a bit and getting it stable, I decided to focus on performance. There appeared to be two main areas of slowness, the upsert to Salesforce and the handling of the returned records. Now, I had no insight into the Salesforce side of things, so I decided to focus on handling the returned records.

The problem of course was that Pentaho was sort of hiding what it was doing. I had to get some insight there. I knew it was doing an Insert into a master table of successful records and then a trigger to update the original account.

Now,  being a 21st Century DBA and taking into account Grant Fritchey’s blog post on Extended Events I had previously setup a Extended Events Session on this database. I had to tweak it a bit, but I got what I wanted in short order.

CREATE EVENT SESSION [Pentaho Trace SalesForceData] ON SERVER
ADD EVENT sqlserver.existing_connection(
    ACTION(sqlserver.session_id)
    WHERE ([sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.login(SET collect_options_text=(1)
    ACTION(sqlserver.session_id)
    WHERE ([sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.logout(
    ACTION(sqlserver.session_id)
    WHERE ([sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.rpc_starting(
    ACTION(sqlserver.session_id)
    WHERE ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)) AND [package0].[equal_boolean]([sqlserver].[is_system],(0)) AND [sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.sql_batch_completed(
    ACTION(sqlserver.session_id)
    WHERE ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)) AND [package0].[equal_boolean]([sqlserver].[is_system],(0)) AND [sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.sql_batch_starting(
    ACTION(sqlserver.session_id)
    WHERE ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)) AND [package0].[equal_boolean]([sqlserver].[is_system],(0)) AND [sqlserver].[username]=N'TempPentaho'))
ADD TARGET package0.ring_buffer(SET max_memory=(1024000))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF)
GO

It’s not much, but it lets me watch incoming transactions.

I could then fire off the ETL in question and capture some live data. A typical returned result looked like

exec sp_execute 1,N'SourceData',N'GQF',N'Account',N'1962062',N'a6W4O00000064zbUAA','2019-10-11 13:07:22.8270000',N'neALaRggAlD/Y/T4ign0vOA==L',N'Upsert Success'

Now that’s not much, but I knew what the Insert statement looked like so I could build an insert statement wrapped with a begin tran/rollback around it so I could test the insert without actually changing my data.  I then tossed in some set statistics IO ON and enabled Include Actual Execution Plan so I could see what was happening.

“Wait, what’s this? What’s this 300K rows read? And why is it doing a clustered index scan on this table?”  This was a disconcerting. The field I was comparing was the clustered index, it should be a seek!

So I looked more closely at the trigger. There were two changes I ended up making.

       -- Link Accounts
       --MERGE INTO GQF_AccountMaster T
       --USING Inserted S
       --ON (CAST(T.ClientId AS VARCHAR(255)) = S.External_Id__c
       --AND S.Transformation in ('Account'))
       --WHEN MATCHED THEN UPDATE
       --SET T.SFID = S.Id
       --;
       
       if (select transformation from Inserted) ='Account'
       begin
              MERGE INTO GQF_AccountMaster T
              USING Inserted S
              ON T.ClientId  = S.External_Id__c
              WHEN MATCHED THEN UPDATE
              SET T.SFID = S.Id
       end

An astute DBA will notice that CAST in there.  Given the design, the Inserted table field External_Id__C is sort of a catch all for all sorts of various types of IDs and some in fact could be up to 255 characters. However, in the case of an Account it’s a varchar(10).

The original developer probably put the CAST in there since they didn’t want to blow up the Merge statement if it compared a transformation other than an Account. (From what I can tell, T-SQL does not guarantee short-circuit evaluation, if I’m wrong, please let me know and point me to definitive documentation.) However, the minute you cast that, you lose the ability to seek using the index, you have to use a scan.

So I rewrote the commented section into an IF to guarantee we were only dealing with Account transformations and then I stripped out the cast.

Then I reran and watched. My index scan of 300K rows was down to a seek of 3 rows. The trigger now performed in subsecond time. Not bad for an hour or so of work. That and some other improvements meant that now we could handle a few 1000 inserts and updates in the time it was previously taking to do 10 or so.  It’s one of those days where I like to think my client got their money’s worth out of me.

Slight note: Next week I will be at PASS Summit so not sure if/when I’ll be blogging. But follow me on Twitter @stridergdm.