Small Victories

Ask most DBAs and they’ll probably tell you they’re not a huge fan of triggers.  They can be useful, but hard to debug.  Events last week reminded me of that. Fortunately a little debugging made a huge difference.

Let me set the scene, but unfortunately since this was work for a client, I can’t really use many screenshots. (or rather to do so would take far too long to sanitize them in the time I allocate to write my weekly blog posts.)

The gist is, my client is working on a process to take data from one system and insert it into their Salesforce system.  To do so, we’re using a 3rd party tool called Pentaho. It’s similar to SSIS in some ways, but based on Java.

Anyway, the process I was debugging was fairly simple. Take account information from the source and upsert it into Salesforce. If the account already existed in Salesforce, great, simply perform an update. If it’s new data, perform an insert.  At the end of the process Pentaho returns a record that contains the original account information and the Salesforce ID.

So far so good. Now, the original author of the system had setup a trigger so when these records are returned it can update the original source account record with the Salesforce ID if it didn’t exist previously. I should note that updating the accounts is just one of many possible transformations the entire process runs.

After working on the Pentaho ETL (extract, transform, load) for a bit and getting it stable, I decided to focus on performance. There appeared to be two main areas of slowness, the upsert to Salesforce and the handling of the returned records. Now, I had no insight into the Salesforce side of things, so I decided to focus on handling the returned records.

The problem of course was that Pentaho was sort of hiding what it was doing. I had to get some insight there. I knew it was doing an Insert into a master table of successful records and then a trigger to update the original account.

Now,  being a 21st Century DBA and taking into account Grant Fritchey’s blog post on Extended Events I had previously setup a Extended Events Session on this database. I had to tweak it a bit, but I got what I wanted in short order.

CREATE EVENT SESSION [Pentaho Trace SalesForceData] ON SERVER
ADD EVENT sqlserver.existing_connection(
    ACTION(sqlserver.session_id)
    WHERE ([sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.login(SET collect_options_text=(1)
    ACTION(sqlserver.session_id)
    WHERE ([sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.logout(
    ACTION(sqlserver.session_id)
    WHERE ([sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.rpc_starting(
    ACTION(sqlserver.session_id)
    WHERE ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)) AND [package0].[equal_boolean]([sqlserver].[is_system],(0)) AND [sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.sql_batch_completed(
    ACTION(sqlserver.session_id)
    WHERE ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)) AND [package0].[equal_boolean]([sqlserver].[is_system],(0)) AND [sqlserver].[username]=N'TempPentaho')),
ADD EVENT sqlserver.sql_batch_starting(
    ACTION(sqlserver.session_id)
    WHERE ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)) AND [package0].[equal_boolean]([sqlserver].[is_system],(0)) AND [sqlserver].[username]=N'TempPentaho'))
ADD TARGET package0.ring_buffer(SET max_memory=(1024000))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF)
GO

It’s not much, but it lets me watch incoming transactions.

I could then fire off the ETL in question and capture some live data. A typical returned result looked like

exec sp_execute 1,N'SourceData',N'GQF',N'Account',N'1962062',N'a6W4O00000064zbUAA','2019-10-11 13:07:22.8270000',N'neALaRggAlD/Y/T4ign0vOA==L',N'Upsert Success'

Now that’s not much, but I knew what the Insert statement looked like so I could build an insert statement wrapped with a begin tran/rollback around it so I could test the insert without actually changing my data.  I then tossed in some set statistics IO ON and enabled Include Actual Execution Plan so I could see what was happening.

“Wait, what’s this? What’s this 300K rows read? And why is it doing a clustered index scan on this table?”  This was a disconcerting. The field I was comparing was the clustered index, it should be a seek!

So I looked more closely at the trigger. There were two changes I ended up making.

       -- Link Accounts
       --MERGE INTO GQF_AccountMaster T
       --USING Inserted S
       --ON (CAST(T.ClientId AS VARCHAR(255)) = S.External_Id__c
       --AND S.Transformation in ('Account'))
       --WHEN MATCHED THEN UPDATE
       --SET T.SFID = S.Id
       --;
       
       if (select transformation from Inserted) ='Account'
       begin
              MERGE INTO GQF_AccountMaster T
              USING Inserted S
              ON T.ClientId  = S.External_Id__c
              WHEN MATCHED THEN UPDATE
              SET T.SFID = S.Id
       end

An astute DBA will notice that CAST in there.  Given the design, the Inserted table field External_Id__C is sort of a catch all for all sorts of various types of IDs and some in fact could be up to 255 characters. However, in the case of an Account it’s a varchar(10).

The original developer probably put the CAST in there since they didn’t want to blow up the Merge statement if it compared a transformation other than an Account. (From what I can tell, T-SQL does not guarantee short-circuit evaluation, if I’m wrong, please let me know and point me to definitive documentation.) However, the minute you cast that, you lose the ability to seek using the index, you have to use a scan.

So I rewrote the commented section into an IF to guarantee we were only dealing with Account transformations and then I stripped out the cast.

Then I reran and watched. My index scan of 300K rows was down to a seek of 3 rows. The trigger now performed in subsecond time. Not bad for an hour or so of work. That and some other improvements meant that now we could handle a few 1000 inserts and updates in the time it was previously taking to do 10 or so.  It’s one of those days where I like to think my client got their money’s worth out of me.

Slight note: Next week I will be at PASS Summit so not sure if/when I’ll be blogging. But follow me on Twitter @stridergdm.

IIS FTP 530 error

To my usual readers, you can ignore this. This is simply my way of making sure that the next time I have to google an issue, or someone else days, there’s a better chance of finding a solution. Note, there are other places posting the same solution.

But simply put, over the weekend client’s IT group rebooted a server and an FTP process started to fail. It took a lot of digging to solve it.

It appears that two things happened:

The IUSR to the FTP root directory got munged (it’s not 100% clear and this may not be a necessary step. But, things did NOT work for me until I did this, but then they continued to work when I removed the IUSR to see if I could recreate the problem.)

The part that was 100% necessary was this: https://manage.accuwebhosting.com/knowledgebase/941/FTP-Error-530-User-cannot-log-in-home-directory-inaccessible.html go to Step 8 and add all users. Why this was removed, I don’t know. But adding this after re-adding the IUSR step seems to have solved the issue.

That said, before anyone asks, “why in the world are you using FTP in the 21st century?” I won’t disagree other than to say this is purely an internal process that simply moves some non-PII data to a 2nd server.  Not a huge justification, but there it is.

Call 911, If You Can

Also known as “things have changed”

For one my of clients I monitor and maintain some of the jobs that run on their various servers. One of them had started to fail about two weeks ago. The goal of the job was basically to download a file from one server, transfer it to another and upload it.  Easy-peasy. However, sometimes the job fails because there’s no file to transfer (which really shouldn’t be a failure, but just a warning).  So, despite the fact that it had failed multiple days in a row, I hadn’t looked at it. And of course no one was complaining (though that’s not always a good reason to ignore a job failure!)

So yesterday I took a look and realized the error message was in fact incorrect. It wasn’t failing because of a lack of a new file, but because it could no longer log into the primary server. A quick test showed the password had been changed. This didn’t really surprise me as this client is going through and updating a number of accounts and passwords. This was simply obviously one and we had missed this one. (Yes, this is where better documentation would obviously be a good idea.  We’re working on that.)

So, I figured the fix would be easy, simply email the right person, get the new password and update the process.  I also was taking the time to update the script to that the password would be encrypted moving forward, right now it’s in plain text and to give the correct error in the event of login in failure.

Well, the person who should have the password wasn’t even aware of this process. As we exchanged emails, and the lead developer chimed in, the conclusion was that this process probably shouldn’t be using this account, and that perhaps even then, this process may no longer be necessary.

So, now my job is to track down the person who did or does rely on this process, find out if they still are and then finish updating the password.  Of course if they’re not, we’ll stop this process. In some ways that’s preferable since it’s one less place to worry about a password and one less place to maintain.

Now, the above details are somewhat specific to this particular job, but, I’m sure all of us have found a job running on a server someplace and wondered, “What is this doing?” Sometimes we find out it’s still important. Sometimes we discover that it’s no longer necessary. In a perfect world, our documentation would always be up to date and our procedures would be such that we immediately remove unnecessary jobs.

But the real world is far messier unfortunately.

(and since the full photo got cropped in the header, here it is again)

Call 911. If you can

Apparently not only can guest rooms can not be called from this phone

And as a reminder, if you enjoy my posts, please make sure to subscribe.

 

Marshmallows Part II

I’ll have to admit, I can rarely tell in advance when one of my posts will hit all the buttons and generate views and when it’ll fall flat. But as I don’t always write for my audience, sometimes I write for my own reasons, I can live with that.

So, how to follow-up on a post that didn’t receive many views, write a follow-up post. You can call me a slow learner.

Actually, it’s about learning. Last time I wrote about my microwave and doing a quick experiment with marshmallows to prove it was really dead.  After 2 days without a microwave it was time to get a new one. Of course I couldn’t get what I wanted because the space it had to fit into was limited in size.  That could have been resolved, but would have meant redoing the cabinet space it had to fit into. And if I were going to redo the cabinet space there, I might as well redo the rest of the cabinets. And if I’m going to redo the cabinets, I really need to redo the counters. And very quickly a replacement $100 microwave I can get in an hour would become a 3-week $10,000 kitchen remodel. I opted for the $100 microwave over the one I really wanted.

And the results are shown at the top of post (and below in case the top doesn’t appear)

melted marshmallow picture

10 seconds of marshmallows in the microwave

It’s quite interesting to me. The best heating was beyond the area of the rotating plate.  But this also shows the value of the rotating plate since if there’s a few sports, if I put a something to heat and everything was stationary, it would take forever to heat since there’s little to no microwave energy there. (This can get complex because of the size of the wave and the height of the material, etc.)

Now, I’d have done more experiments, but it seems a certain someone in the house enjoys marshmallows more than I do and had eaten a bunch and this was all I had.

But, I have a working microwave and I’ve proven how important the rotating plate can be (not that I had much doubt).

And that’s science to me; doing experiments and learning.

Oh and about the SQL query I was updating. It’s going into production this week hopefully. I was about to eek out about a 10-20% improvement. Beyond that, not much I could do because it really ends up scanning an entire table, on purpose.  Only so much you can do there.

One last thing: there may not be a post next week because I’ll be teaching at the NCRC weeklong cave training class in Indiana and will have limited internet and time.

Marshmallows

Though I attended RPI, which is generally considered an engineering school, my degree is a BS in Computer Science. I say that because I consider myself more of a scientist than an engineer at times. And honestly, we all start out as scientists, but many of us lose that along the way.

Anyone who has had a small child has observed a scientist in action. No, they’re not in a lab full of test tubes and beakers and flasks giving off noxious smells. But they are in the biggest lab there is, the world. They also don’t necessarily realize it. Nor do parents. But every time they drop a Cheerio, they’re testing gravity.  Fortunately (or unfortunately depending on your point of view) so far every time they’ve managed to prove that gravity works. This is the most obvious example, but when you stop to think about it, much of the first few years of life is all about experimenting. Most of the time it goes well, but sometimes, as a burnt hand will attest, the experiment has a less than ideal outcome.

And it’s the fear of burned hands that leads to parents to utter that common  refrain, “Don’t touch that!” or the variation “Don’t do that!”.  Soon, over time, our experimentation starts to get reined in until we do very little of it. This can be inhibiting.

Years ago I used to teach an “Introduction to Windows” adult education class. It was I believe a 6 week class and I taught several over the course of a couple of years. It didn’t take me long to realize the biggest constraint on the students ability to succeed in the class was that they had internalized “Don’t do that, you might break something.” Once I realized that, half my teaching pedagogy simply became, “Touch that, you won’t break it, and if you do, it’s not a big deal, and if it is, we’ll fix it anyway.” Seriously, more than anything else, I had to encourage most of my students to experiment with the computer.

More recently I realized I had stopped doing as many experiments in my life as I should be doing. About 1.5 weeks ago I attended a Wilderness Medicine Conference a friend of mine had told me about. At the end of the very wet, cold, rainy day, a bunch of us went outside and tried to start a fire. Starting a fire, let alone in such conditions was something most of the students had never done. I had, but not in years. With some effort, and experimentation, including using the outside box of a single serving size package of Fruit Loops, we finally managed to get the fire going.

But this got me thinking. When I go hiking, I carry a tiny ziplock back in my jacket with some firestarting materials. They’re there in case of an emergency. But, the thing is, I had never actually tried them and realized if I didn’t know how well they worked in practice, I couldn’t rely on them in emergency. So, I went outside, and started a fire. And I learned that yes, my materials ARE adequate, but the dryer lint needed to be pulled apart more than I realized. I tried again later in the week, and added the use of a toilet paper roll to form sort of a chimney so the starting fire would draft better. This, and the better pulling of the lint worked even better and a single match was sufficient this time.  This gave me more confidence that in an emergency, in less than ideal conditions I could get an actual fire going.

But, I wasn’t done! Our microwave broke this weekend. But, before I wrote it off, I wanted to make sure it wasn’t a fluke or something else. So, in this case I decided to get a bag of marshmallows and lay them out inside the microwave to see if I was getting ANY energy out of the magnatron. Turns out, nope, nada, nothing. So, today or tomorrow I will be buying a new microwave. But, it was a fun, and later tasty experiment.

Without delving deep into the scientific method here, I’ll say at a simple level, science is about having a hypothesis and testing it. The testing it is important.

To bring this back to SQL. First, you have a hypothesis that your backups will work. Have you tested that hypothesis? If not, do so immediately. Even if they do, you might learn something now that will be important when you have to do it for real. Perhaps you learn the volume your backups are on only has write access. Or perhaps you learn you need to retrieve your encryption keys and the person who controls access to them is on vacation. Or perhaps your RPO is 4 hours and the restore takes 6 hours.  So, experiment.

query plan

Capture of a random query plan

Recently for one client I’ve spent some time experimenting with various changes to help improve the performance of some queries. Not everything I tried worked, but some things did. So, again experiment.

I’m curious what recent experiments you may have done, SQL or otherwise. What were their outcomes?

 

Punditry

We’re all experts on everything. Don’t think so? Go to any middle school or high school soccer game and you’ll be amazed at how many parents are suddenly experts on soccer. It’s also amazing at how many parents are parents of future NCAA Division I scholarship soccer players.

Seriously though, we’re all guilty of this from time to time. I’ve done it and if you’re honest, you’ll admit you’ve done it.

Yesterday the world suffered a loss, the near destruction of Notre Dame.  Early during the fire our President tweeted:

“Perhaps flying water tankers could be used to put it out. Must act quickly!”

As many have pointed out, this was actually a terrible idea. The idea of dropping 100s of kilograms of water onto an already collapsing roof is most likely to do more damage than not. But, while I think it’s easy to mock the President for his tweet, I won’t. In some ways it reminds me of the various suggestions that were made last summer during the Thai Cave Rescue. We all want to help and often will blurt out the first idea that comes to mind.  I think it’s human nature to want to help.

But, here’s the thing: there really are experts in the field (or to use a term I see in my industry that I dislike at times: SME (it just sounds bad) Subject Matter Expert.)

And sometimes, being a SME does allow you to have some knowledge into other domains and you can give some useful insight. But, one thing I’ve found is that no matter how much I know on any subject, there’s probably someone who knows more. I’ve written about plane crashes and believe I have a more than passing familiarity in the area. Perhaps a lot more than the average person. But, there’s still a lot I don’t know and if I were asked to comment by a news organization on a recent plane crash, I’d probably demur to people with far more experience than I have.

Having done construction (from concrete work in basements to putting the cap of a roof on), I again, have more than a passing familiarity with construction techniques and how fire can have an impact. That said, I’ll leave the real building and fire fighting techniques to the experts.

And I will add another note: even experts can disagree at times. Whether it’s attending a SQL Saturday or the PASS Conference itself, or sitting in a room with my fellow cave rescue instructors, it can be quite enlightening to see the different takes people will have on a particular question. Often no one is wrong, but they bring different knowledge to the table or different experiences.

And finally, you know what, sometimes the non-expert CAN see the problem, or a solution in a way that an expert can’t. But that said, at the end of the day, I’ll tend to trust the experts.

And that’s the truth because I’m an expert on punditry.

Shouldn’t that be plugged in?

That was the question a friend of mine in 6th grade asked. As a result I developed what I call the Charlie M. rule after my friend. It was sort of Show and Tell day in 6th grade and we were supposed to talk about our hobbies. I brought in a circle of HO scale track (18″ radius for those interested) and my locomotive (a model GP-38) and some cars and of course the transformer to power it all.

I set it all up in front of the class and dutifully tried to demonstrate it. Nothing moved. I checked to make sure the engine was properly on the tracks: check. I made sure the wires were connected to the transformer: check. I made sure the wires were connected to the track: check.  I was stumped: check. Finally Charlie raised his hand and asked, “Shouldn’t that be plugged in?”  Ayup, in all my nervousness and being hurried, I had forgot the most basic step, of plugging in the transformer.

I try to keep this in mind when troubleshooting: check the obvious. I ran into this again over the weekend when trying to get my BMW Z3 running again. (Side note: no, consulting does not pay that well. This is one of the few tangible items I have left from my dad’s estate). It had stopped running late last fall and at the time I spent a little time trying to make it run, without much success. Finally, with the family’s help I pushed and pulled it into a shelter for the winter and then left it for the winter.

I wasn’t planning on worrying about it until later this month, but then… well let’s just say when I put the large box with metal corners into the rear of the Subaru, I forgot to check the obvious and slammed the rear hatch down on the box. Well, the box, realizing it didn’t have enough room, decided to take advantage of the metal corner and proceeded to make more room by punching out the rear window of the Subaru. Oops.  Such a simple mistake, but a large one.

So, while waiting for the Subaru to get fixed, I decided it was time to get the BMW on the road.

Now due to the symptoms, I knew it wasn’t a dead battery or bad gas. So taking advantage of what I call my extended brain, I asked others for help.  We had narrowed the problem down to either the clutch interlock switch or the starter. Neither looked like it would be an easy self-service and I was getting frustrated. I finally decided that perhaps checking the ODB-II codes might yield more information. Strangely though, the reader didn’t power up; there were no codes to read. That struck me as a strange. So here I did check the obvious: I took the reader to the Subaru and made sure the reader worked. And it worked fine on the Subaru. I went back to my extended brain and mentioned that.

“Oh, have you checked the fuses?”

“Nah I thought about it, but everything seems to have power.”

“You sure, sounds like the onboard computer fuse might be blown.”

So, I trudged out and took off the fuse cover.  Now, I don’t really believe in fate or signs from God, but it was weird, in the list of about 40 fuses, the first one my eyes fell on was Computer. “Nah, can’t be.”

I pulled it, and sure enough, it was burned out. I pulled it and replaced it. Got in the car and thought, “it can’t be that easy, can it?” A turn of the key and the next thing I knew, the 6 cylinders were purring.

All that work and frustration because I had overlooked the basics.

This is far from the first time I’ve overlooked the basics. And I bet you’ve done the same thing. I have a theory about why we do this, and it is in part because the basics ARE so fundamental that we assume it has to be something else. In my model train example, dirty track and loose wires, especially in an ad-hoc setup are arguably a more common issue than forgetting to plug in the transformer. In my BMW case, because literally everything else worked, I assumed the power was getting to the computer. And honestly, even now, thinking about it, I’m surprised the dash light startup didn’t change at all because of a lack of computer.

I’ve seen this in databases and elsewhere. I was recently trying to do a quick restore of a database from one machine to another and the obvious wasn’t working. It took me a bit to remember the client’s new security setup prevented this specific case for these two machines. Once I remembered that, the problem and subsequent solution were obvious.

This in part goes back to why I like using a rubber-duck at times. It can force you to review your assumptions and check the basics.

Having a problem? Employ the Charlie M. rule and check the basics.