Barriers

Years ago, I had my team building out our racks at our new datacenter. I was quite proud of it. It was going to be our first planned from the start build-out of 6 racks, as opposed to the hodge-podge build-out we had done of 5 cabinets we had previously rented. Instead of just cramming in equipment where it would fit, we could plan where every piece would go and where we’d leave room for future expansion. This was in 2001, so it was still during a big Internet boom.

One of the things I had decided on doing early on was color coding cables. Red was for anything in front of the firewall for example.  On the backside, every server had two network cards, one for outgoing traffic (the “front-net”) and the second for traffic between the servers (the “back-net”).  To help distinguish between the two, I had ordered a bunch of green cables for the front-net, since that data was “safe” and green is “safe”, and blue cables for the back-net, both start with “b”. Sure, somewhat silly mnemonics, but they worked.

Until, about a week after we finally completed our datacenter move, not one, but two members of my five person team commented, “oh, they were different colors? I couldn’t tell, I’m colorblind.”

“Doh!”  So much for my nice color-coded system.  It can be fairly easy to overlook barriers when you don’t see them. Sometimes it takes more thought and action on your part. Sometimes it takes asking questions, or observation.

Lately I’ve been trying to look for more barriers that I might not have seen before and looking into what I can do to remove them. I’ll be the first to admit, I’m not always successful and I’m still learning. But hopefully we call can.

One area I’ve been focusing on this is in my work for the Capital Area SQL Server User Group. Right now I’m looking at two possible barriers. I say possible because I honestly don’t know if they’re issues or not:

First, I’m trying to find someone who can provide ASL interpretation.  Here’s the thing: we have never had, as far as I know, a deaf person attend one of our events, or even express an interest. Is that because there are no deaf DBAs in the area or because they know if they do attend, they probably will face barriers an person with hearing won’t face?

But, that actually begs the question: if there are no deaf DBAs in the area, why? Perhaps there are deaf people who WANT to become a DBA, but can’t because the barriers that exist well before they even attempt to attend one of our events.  I don’t know, but I hope to explore this issue a bit more.

Another item I’ve started to look into, is whether some sort of child-care services at our SQL Saturday event would help encourage more people to attend. My initial thought is, “it’s Saturday, so ideally a spouse can watch kids” or a similar solution. But, that’s assuming every attendee has a spouse or the extra money to hire a babysitter for an entire day. In other words, it’s making a lot of assumptions.  There’s definitely some major logistical concerns that I have to continue to explore before we can even think about offering it. But I’m also simply trying to figure out if it would make a difference.  Unfortunately, currently for our user group meetings itself, it would not be practical. But even then it may be worth looking into.

On a personal note, I have a friend who had a service dog. She was interested in joining me on a caving trip.  So we actually discussed the logistics of it and determined that it was in fact possible to take her caving with her service dog.  There was some logistics we had to work out and I did have to get permission from the cave owner.  Unfortunately, our scheduling never quite synched up and we had to forego the trip. But the point is, barriers CAN be overcome if one works at them and is willing to be a bit flexible.

Today’s takeaway: What barriers have you looked for and tried to remove? They’re out there, even if you can’t see them.

 

Moving the Needle – Hard

One of the things I enjoy is problem solving or “debugging”.  I don’t necessarily mean debugging code, though I’ve done plenty of that.  One particular class of problems I like solving is when something isn’t working “right”.  I’m currently involved on one such issue.

Just before the holidays, the lead developer at one my of my clients put me in touch with a team in another division to help them solve some performance issues they were having with their SQL Server. This is the sort of issue I generally like to sink my teeth into.

I started poking around and asking questions. I was a bit crushed when in the initial review they listed all the things they had tried and I had to nod my head sagely (which, being a remote worker went unnoticed by them) because they had tried all the basic things. They had, fortunately for them, ruled out a lot of the easy fixes.

So now it came down to some digging. I won’t go into too many details, but will cover some of the things uncovered and tried. For one thing, they have 44 SQL jobs that run every 20 seconds and basically do a poll of a database to see if there’s any work to be done. So, every 20 seconds 44 SQL jobs would fire up, do a quick select and then go back to sleep.  On their new server, they were on average taking 6 seconds a piece.  In addition, the CPU would spike to 100% for about 5-6 seconds and then drop back down. We are also seeing a lot of wait states of the MSQL_XP variety (accounting for about 1/2 the time the system is waiting and averaging about 61.1 ms each time. [Thanks to Brent Ozar’s script here!])

We tried three things, two helped, one didn’t.

First, I asked them to spread the jobs out. So now, basically 2-3 jobs are started every second. This means over a 20 second period all 44 jobs are run, but not all at once.  This had an immediate impact, the jobs now were taking about 2-3 seconds. A small victory.

Secondly, we changed the MAXDOP settings from 0 to 4.  This appeared to have no impact on the jobs. In retrospect makes a lot of sense. Each job is a separate task and basically single-threaded, so SQL Agent won’t care about the MAXDOP.

For those who aren’t familiar with SQL Server, MAXDOP is short for “Maximum Degree of Parallelism” This controls how much SQL Server will try to spread out a task among its CPUs. So for example you had 100 tests to grade and sort into alphabetical order and you had 1 person to grade them. That one person would have to do all the work. You might decide that having 100 people is 100 times faster since every person can grade a test at the same time. But then you have to hand out the 100 tests and then collect the tests and resort them back into alphabetical order, and this takes longer than you think.  So by playing around, you realize it’s actually faster to only have 10 people grade them and sort them.  In other words, sometimes, the effort of spreading out the work itself takes longer than the time saved by spreading it out.)

But, one thing that didn’t change was the CPU spike. But, since the poll jobs were twice as fast, we were happy with that improvement.

However, the real goal of the poll jobs was to wake up ETL jobs to handle large amounts of data. These were running about 1/2 as fast as they’d like or expected.

Here, MAXDOP does seem to have changed things.  In most cases, the ETL jobs are running close to twice as fast.

But, here’s the funny thing. I didn’t really care. Yes, that was our goal, but I’d have been content if they had run twice as slow. Why? Because at the point we changed the MAXDOP settings, my goal wasn’t to improve performance, it was simply to move the needle, hard.  What I meant by that was, by changing the MAXDOP from 0 (use all 32 CPUs) to 4 I was fairly confident, for a variety of reasons, I’d impact performance.  And I did in fact expect performance to improve.  But, there were really 3 possible outcomes:

  1. It improved. Great, we know we’re on the right track, let’s tweak it some more.
  2. It got worse. Great, this is probably NOT the solution, but let’s try it the other way and instead of 4 CPUs, try say 16 or even a larger value. At least we know that the MAXDOP is having an impact.
  3. Nothing change. In this case, we can pretty much rule out parallelization being a factor at all.

In other words by forcing SQL Server to use only 4 CPUs instead of all 32, I expected a change. If I didn’t see a change, one way or the other, I could mostly rule out parallelization.

Finally, once we saw that a MAXDOP of 4, we started to play with the threshold of parallelization. In this case we ended up with option 3 above. We tried a fairly small value (5) and a fairly large value (100) and haven’t seen much of a difference. So the cost threshold doesn’t seem to have much of an impact.

So, we’re not fully there yet, there’s a number of other factors we need to consider.  But sometimes when you’re approaching the problem, don’t be afraid to move the needle, in any direction, hard, can tell you if you should continue to try that approach. In this case with MAXDOP it indicated we were on the right track, but with the cost threshold, we’re probably not.

We’ve got a lot more to do, including seeing if we can eliminate or speed up the MSQL_XP wait states, but we’re on our way. (For the record, I don’t expect much change on this one, it’s really SQL Server saying, “hey, I called out to an external procedure and am waiting to hear back” so we can’t tweak the query or do other things that would make much of a difference.”

 

 

 

 

 

Janus 2 – 2019

“All my life’s a circle” – Harry Chapin

The New Year is now upon us. It’s now January around the world.  For those who don’t know where the name of the month comes from, or why my previous blog post and today’s are named as they are, it comes from the Roman God Janus.  Janus looked backwards and forwards. I thought it was appropriate for posts bracketing the New Year. In addition, the name of the month January is often believed to come from the name of the god, but that appears to be a false etymology.

Yesterday I looked back. Today, I’ll look forward.  I’m not necessarily a fan of New Year’s Resolutions (other than resolving to live one more year, which I’ve been successful at so far every time) so call these goals:

  • Continue to blog at least once a week. Last year I think I missed a week while on vacation, but otherwise I pretty much succeeded.
  • Hit 2000 page views. Last year I hit 1907.  I think I can exceed that this year. Of course I’ll need your help!
  • Continue speaking at SQL Saturdays. I haven’t set my schedule, but I already have 3-5 in mind. I’m not sure I’ll do 6 again, but we’ll see.
    • Expand my “SQL Server for under $200” session
    • Expand my “SQL Server Backups” (perhaps into a full precon)
    • Add one more topic to my list of sessions (see current ones here)
    • Shoot for at least one overseas engagement
  • Shoot for speaking at SQL Summit!
  • Figure out how to get an MVP!
  • Publish at least 3 more articles for Redgate’s Simple Talk
  • Continue to promote and support Women in Tech as well as other minority groups
  • Continue to learn PowerShell
  • Continue to learn about SQL Server on Linux
  • Play with containers, just a bit. This is really a minor goal given all the others I have, but I figure I should learn a little.
  • Pick up at least 1-2 more decent sized customers
  • Continue teaching cave rescue
  • Cave more!
  • Hike more!
  • Bike more!
  • Travel
  • Have fun!

That last goal is important to me. If I’m not enjoying what I’m doing, why do it? Life is too short to hate what you do with life. If you can find a way to enjoy life, do it!

Most of the goals above are SQL related, but that doesn’t mean that’s the major focus of my life. It’s just the place this blog touches upon the most these days.

I have a number of personal goals, but that’s for me and I won’t be sharing here.

In any event, I wish everyone in my biological family, #SQLFamily, Caving family, and other chosen families a wonderful and amazing New Year and hope that the new year brings you peace and happiness.