Copying a Large File

It was a pretty simple request actually. “Can you copy over the Panama database from FOO\WAS_21 to server BAR\LAX_45?”

“Sure, no problem.”

Of course it was a problem.  Here’s the issue. This is at one of my clients. They have a couple of datacenters and have hundreds of servers in each.  In addition, they have servers in different AD domains.  This helps them partition functionality and security requirements. Normally copying files between servers within a datacenter isn’t an issue. Even copying files between the different domains in the same datacenter isn’t normally too bad. To be clear, it’s not great.  Between servers in the same domain, it appears they have 1GB connections, between the domains, the firewall seems to throttle stuff down to 100MB.

The problem is when copying between different domains in different datacenters. This can be abysmally slow. That was my problem this week.  WAS_21 and LAX_45 were in different datacenters, and in different domains.

Now, for small files, I can use the cut and paste functionality built into RDP and simply cut and paste. This doesn’t work for large files. The file in this case was 19GB.  So this was out.

Fortunately, through the Citrix VDI they provide, I have a temp folder I can use. So, easily enough, I could copy the 19GB file from FOO\WAS_21 to that. That took just a few minutes.  Then I tried to copy it from there to BAR\LAX_45. This was slow, but looked like it would work.  It was going to take 4-5 hours, but they didn’t need the file for a week.

After about 4.5 hours, my RDP session locked up. I logged out and back in and saw the copy had failed. I tried again. This time at just under 4.5 hours I noticed an out of memory error. And then my session locked up.

So, apparently this wasn’t going to work. The obvious solution was to split the file (it was already compressed) into multiple files; except I’m not allowed to install most software on the servers. So that wasn’t a great option. I probably could have installed something like 7zip and then uninstalled it, but I didn’t want to deal with that and the paperwork that would result.

So I fell back to an old friend: Robocopy.  This appeared to be working great. Up until about 4.5 hours.  And guess what… another out of memory error.

But I LIKE challenges like this.

So I looked more closely. Robocopy has a lot of options. There are two that stuck out: /Z – restartable mode. That looked good. I figured worst case, I’d start my backup, let it fail at about 85% done and then resume it.

But then the holy grail: /J :: copy using unbuffered I/O (recommended for large files). 

Wow… unbuffered… that looks good. Might use less memory.

So I gambled and tried both.  And low and behold, 4:19 later… the file was copied!

So, it was an annoying problem but… I had solved it.  I like that!

So the take-away: Don’t give up. There’s always a way if you’re creative enough!

Wet Paint

There’s an old saw that if you tell someone there’s 1 billion people in China, they’ll believe you, but if you put up a wet paint sign, they’ll have to touch the paint to be sure. I find it interesting that we believe some things easily and others not so easily.  Or even sometimes, we may think we believe something until we actually experience it. Then we somehow believe it even more.

Two incidents of this nature occurred to me within the last year.  Last year I drove to my uncle’s house in South Carolina in order to observe the total eclipse. Prior to totality, one of the phenomena that I knew would happen was seeing crescent shaped “shadows”. (Really it’s sort of the reverse since you’re seeing crescent shaped light.) I mean I had read about it, I had seen pictures, and I basically understood why they would occur.  And yet, at one point I was walking back into the house to do something when I looked down and lo and behold… I saw crescent shaped “shadows”.  My reaction was one of “Holy Cow, this really DOES happen!”  Now, I had no doubt intellectually that it should happen, or that I would most likely see it. But, actually seeing it was still amazing and while I really had no reason to have to verify the phenomenon, the fact that I personally had experienced it was incredible. I understood it at a visceral level, not just an intellectual one.

The second incident occurred just the other day. As a long-time sufferer of allergies, spring time has sometimes been a bit less than comfortable for me. And over the years, I had seen videos of pine trees releasing their pollen in a massive cloud (I’m thankfully NOT allergic to pine pollen it appears) but I had never actually experienced it myself. So again, I knew intellectually it happens.

So two days ago I was sitting outside looking at a pine tree when suddenly I saw this puff of what looked like smoke, and then a cloud of pollen waft away from the tree. Again, I had experienced that Holy Cow moment when the visceral experience matched the intellectual one. It was pretty cool.

That all said, I have stopped touching wet paint at this point in my life. But I still love these sort of confirming experiences (though I’m not eager to start counting heads in China at this time).

What facts did you know that when you finally experienced them first hand, had an impact upon you. I’d love to hear them.

Teamwork

I spent the majority of last week with some of the greatest people I know: my fellow NCRC (National Cave Rescue Commission) instructors and students.  Let’s start by the obvious, it takes a strange twist of mind to be the sort of person that actually enjoys crawling through dark holes in the ground. Now take that same group of people and add  a sense of altruism and you’ve got folks willing to rescue others trapped in caves.

Let me interject in here by the way, that caving is actually a fantastically safe sport. When an accident in a cave makes the news, it’s because it’s so rare and generally so unique as to attract attention. The NSS puts out a report every 2 years detailing the accidents across the US. It’s well worth the read if you can find a copy. Now back to the rest of my blog post.

I have been one of the many instructors who teach the “Level 2” class. I love this level because students have gotten beyond the basics and are starting to learn the “why” we do things more than simply the “how”.  And we start to really work on their team leadership and teamwork skills (as I mentioned last week in my post about the lost Sked).

Good teamwork isn’t just a bunch of people working to solve a task. It’s them working together and often anticipating the needs.  Two events last week illustrated a failure and a success.  At one point, I was a mock patient and the class had been broken into two teams. The first one had to get me out of a tight crawl-way (packaged in a Sked) and up to an open area. From there another team had set up a haul system to get me to the top of a 60′ or so (I’ve never really measured it) block of rock.  This was towards the end of the week when the students (and instructors) are a bit tired and overwhelmed with everything they had learned.  We had allocated 90 minutes for the exercise.

During the first part, I just felt like something was off. Nothing serious, nothing I could put my finger on. But the magic the students had shown all week just wasn’t quite there.

And then there I was, 2 hours into the exercise, laying on top of the tight area, waiting to be hauled up. Someone on the lower team said that I was ready to go.  Someone at the top of the block said they were ready to go.  And then I sat for 2-3 minutes until finally things started happening again.  But for a few minutes the days of teamwork had fallen apart. They weren’t trying to anticipate needs or even work cooperatively.  It wasn’t a huge issue, but it did get a reaction from our oldest, most irascible Level 2 instructor.  I think the word ‘disappointed’ was used at least twice.  It might have been a bit harsh, but… it worked.

As the final exercise of the day, the instructors decided to have a bit of fun with the students. We hid the previously lost (and then found) Sked.  The instructor in charge then informed the students that their lost “patient” for this particular exercise was about 4′ tall, last seen wearing bright orange, and was afraid of the dark. As details were added, it suddenly dawned on them what we had in mind.

But they took the task seriously. They did an excellent search and when they were then told the “patient” wasn’t responsive to stimuli, and they couldn’t rule out a c-spine injury, therefore the Sked had to be packaged in another litter, they did so with gusto and honestly, one of the best packaging jobs I had seen all week. All this while laughing.

They took an absolutely silly scenario, laughed while doing it, yet exhibited amazing teamwork. They were back on their game!

Sometimes teams can start to fall apart, but reminding them of how good they can be and providing a bit of levity can help elevate them back to their best!

 

A Lost Sked

Not much time to write this week. I’m off in Alabama crawling around in the bowels of the Earth teaching cave rescue to a bunch of enthusiastic students. The level I teach focuses on teamwork. And sometimes you find teams forming in the most interesting ways.

Yesterday our focus was on some activities in a cave (this one known as Pettyjohn’s) that included a type of a litter known as a Sked. When packaged it’s about 9″ in diameter and 4′ tall. It’s packaged in a bright orange carrier. It’s hard to miss.

And yet, at dinner, the students were a bit frantic; they could not account for the Sked. After some discussion they determined it was most likely left in the cave.

As an instructor, I wasn’t overly concerned, I figured it would be found and if not, it’s part of the reason our organization has a budget for lost or broken equipment, even if it’s expensive.

That said, what was quite reassuring was that the students completely gelled as a team. There was no finger pointing, no casting blame. Instead, they figured out a plan, determined who would go back to look for it and when. In the end, the Sked was found and everyone was happy.

The moral is, sometimes an incident like this can turn into a group of individuals who are blaming everyone else, or it can turn a group into a team where everyone is sharing responsibility. In this case it was it was the latter and I’m quite pleased.

Legacy

“…the good is oft interred with their bones.”

Or these days, lives on in the Internet. I never quite agreed with Shakespeare in this line. I think the good lives on beyond the grave.

In the book, Lies my Teacher Taught Me author James Loewen talks about how certain African tribes divide people into three categories: those alive, the sasha or living-dead, and zamani or the dead.

This was brought to mind yesterday when I was trying to debug an issue which turned out to be a bug in SQL Server 2016 SP2.  While trying to debug it, I needed to add a user to an SSIS setup. This has been a problem in the past, but I recalled I had used #SQLHELP on Twitter to ask the question and gotten a great answer. So, a quick search later found the response I was looking for. The fully correct answer (since MSFT’s page leaves out a step) was available at: http://sqlsoldier.net/wp/sqlserver/howdoigrantaccesspermissionsforssistousers

Now, many of my readers won’t recognize the name, but some will: @SQLSoldier, a member of the #SQLFamily that passed away recently. At the time of his passing I had forgotten that he had reached out to help me last year. The search yesterday though brought it back to me. I never had the honor of meeting Mr. Davis in person, but I know many others spoke highly of him. It was comforting to me to know that even months later his legacy was still helping me (and presumably others).

After thinking about that, I got thinking about my dad.  Soon after he passed in 2015, I picked up the hefty Milwaukee right-angle drill that had been his and was now mine. I was working on the addition (that he had helped design before his death) that has since become my office. I had always liked that particular tool. It has a certain heft and power to it.  At the time, with his death so close at hand, it was a form of grief therapy for me. I had often used this in my youth, helping him out with various construction projects. To this day I’ll pick up one of the tools I inherited, or start a house project using the skills he taught me and I realize, he’s sasha, living-dead. He’s still lives on in me.

SQLSoldier is also living-dead, he’s very real in the hearts and minds of those who knew him and his legacy lives on, still helping others, such as myself.

As my age is now entering its 2nd half-century,  I wonder more and more what my legacy will be. I hope that when I’m sasha, my legacy can still help and aid others.

And with that, I will conclude with a scene from one of my favorite actors in one of my favorite movies:  “What will your verse be?”

 

RCA or “get it running!”

How often have any of us resorted to fixing a server issue by simply rebooting the server?  Yes, we’re all friends here, you can raise your hands. Don’t be shy. We all know we’ve done it at some point.

I ask the question because of a recent tweet I saw with the hashtag #sqlhelp where Allan Hirt made a great comment:

Finding root cause is nice, but my goal first and foremost is to get back up and running quickly. Uptime > root cause more often than not.

This got me thinking, when is this true versus when is it not? And I think the answer ends up being the classic DBA answer, “it depends”.

I’m going to pick two well studied disasters that we’re probably all familiar with. But we need some criteria.  In my book IT Disaster Response: Lessons Learned in the Field I used the definition:

Disaster: An unplanned interruption in business that has an adverse impact on finances or other resources.

Let’s go with that.  It’s pretty broad, but it’s a starting point. Now let’s ignore minor disasters like I mention in the book, like the check printer running out of toner or paper on payroll day. Let’s stick with the big ones; the ones that bring production to a halt and cost us real money.  And we’re not going to restrict ourselves to IT or databases, but we’ll come back to that.

The first example I’m going to use is the Challenger Disaster. I would highly recommend folks read Diane Vaughen’s seminal work: The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA. That said, we all know that when this occurred, NASA did a complete stand-down of all shuttle flights until a full RCA was complete and many changes were made to the program.

On the other hand, in the famous Miracle on the Hudson, airlines did not stop flying after the water landing. But this doesn’t mean a RCA wasn’t done. It in fact was; just well after the incident.

So, back to making that decision.  Here, it was an easy decision. Shuttle flights were occurring every few months and other than delaying some satellite launches (which ironically may have led to issues with the Galileo probe’s antenna) there wasn’t much reason to fly immediately afterwards.  Also, while the largest points were known, i.e. something caused a burn-thru of the SRB, it took months to determine all the details. So, in this case, NASA could and did stand-down for as long as it took to rectify the issues.

In the event of the Miracle on the Hudson, the cause was known immediately.  That said, even then an RCA was done to determine the degree of the damage, if Sullenberg and Skiles had done the right thing, and what procedural changes needed to be made.  For example one item that came out of the post-landing analysis was that the engine restart checklist wasn’t really designed for low altitude failures such as they experienced.

Doing a full RCA of the bird strike on US Airways 1549 and stopping all over flights would have been an economic catastrophe.  But it was more than simply that. It was clear, based on the millions of flights per year, that this was a very isolated incident. The exact scenario was unlikely to happen again.  With Challenger, there had only been 24 previous flights, and ALL of them had experienced various issues, including blow-bys of the primary O-ring and other issues with the SRBs.

So back to our servers.  When can we just “get it running” versus taking downtime to do a  complete RCA vs other options?

I’d suggest one criteria is, “how often has this happened compared to our uptime?”

If we’ve just brought a database online and within the first week it has crashed, I’m probably going to want to do more of an immediate RCA.  If it’s been running for years and this is first time this issue has come up, I’m probably going to just get it running again and not be as adamant about an immediate RCA. I will most likely try to do an RCA afterwards, but again, I my not push for it as hard.

If the problem starts to repeat itself, I’m more likely to push for some sort of immediate RCA the next time the problem occurs.

What about the seriousness of the problem? If I have a server that’s consistently running at 20% CPU and every once in awhile it leaps up to 100% CPU for a few seconds and then goes back to 20% will I respond the same way as if it crashes and it takes me 10 minutes to get it back up? Maybe.  Is it a web-server for cat videos that I make a few hundred off of every month? Probably not. Is it a stock-trading server where those few seconds costing me thousands of dollars?  Yes, then I almost certainly will be attempting an RCA of some short.

Another factor would be, what’s involved in an RCA? Is it just a matter of copying some logs to someplace for later analysis and that will simply take a few seconds or minutes, or am I going to have to run a bunch of queries, collect data and do other items that may keep the server off-line for 30 minutes or more?

Ultimately, in most cases, it’s going to come down to balancing money and in the most extreme cases, lives.  Determining the RCA now, may save money later, but cost money now. On the other hand, not doing an RCA now might save money now, but might cost money later.  Some of it is a judgement call, some of it depends on factors you use to make your decision.

And yes, before anyone objects, I’m only very briefly touching upon the fact that often an RCA can still be done after getting things working again. I’m just touching upon the cases where it has to be done immediately or evidence may be lost.

So, are your criteria for when you do an RCA immediately vs. getting things running as soon as you can? I’d love to hear them.

And credit for the Photo by j zamora on Unsplash

SQL Saturday Philly Followup

So last week I visited a client I have near King of Prussia, PA and then went to SQL Saturday.

This particular client I’ve worked with for over 5 years now and it’s been quite an interesting time. What started out as a 3-6 month project turned into a multi-year, basically full-time engagement and now it’s down to some piecemeal work. But that too is unfortunately slowly ending as they bring their new in-house DBA up to speed. I spent about 1/2 my time there doing a data-dump to him and my manager.

But, I’m not here to talk about that, I’m here to talk about SQL Saturday, customer service and a bit more.

But first, a joke:

“How many DBAs does it take to solve a hardware problem?”

By the count of it, at least a 1/2 dozen.

I got there and for my first session decided to attend Kathi (aka Aunt Kathi) Kellenberger’s session on windowing functions. Fortunately she showed up early because it turns out she could not get her laptop to talk to the monitor. We tried one fix using an existing cable until we realized we had the wrong end plugged in (basically the monitor end we stole from a monitor).  This is one of the big fears of any presenter, showing up and not being able to project ones screen!  So, over the next 30 minutes several of us tried to help with a bit of everything including the “reboot the projector advice”.

Finally after one of the organizers (with permission of the hosting organization) pried off the back of the podium was I able to realize “oh, THIS cable will work”. I handed it up to Kathi and she plugged in her laptop and was able to project. And it was, as I expected a great, informative presentation.  I definitely learned a few things.

I have Kathi to thank (or to blame!) for inspiring me to write my book. So I was more than glad to help her out.

My talk on presenting was well received with a good turnout and a number of questions from audience members. This was in contrast to when I gave it in DC where I had only had a few audience members. And it was in definite contrast to my experience in Colorado Springs where I had no one show up for my presentation. I’ll admit, it was nice to get back on the horse and have such a successful presentation.

Later, I made a point of attending a session by Sarah Hutchins on how to Ace your Job Interview. It was her first time presenting at SQL Saturday and besides being interested in the topic, wanted to support her. She did great.  It did turn out that she needed help with her clicker for PowerPoint so I loaned her mine. I in fact have a slide in my presentation about clickers and helping out fellow speakers, etc.

So, it was with a bit of a laugh that I saw Grant Fritchey’s blog post this week on Presentation Tools. Grant was one of the first speakers I ever saw at a SQL Saturday, back in Boston, I believe 4 years ago.  Besides being a great speaker, I’ve appreciated he’s felt a need to “give back” to the community and in part he does that by supporting and encouraging up and coming speakers and writing informative posts like his most recent one cited here.

So a lot of this weekend was about how #SQLFamily helps each other. Kathi encouraged me to write a book, I was able to help her and Sarah with their hardware issues, Grant funny enough this week follows up on advice on hardware for speakers and so the circle continues.

Contrast that to my stay at Extended Stay America. There’s an adage in business:

It takes months to find a customer and only seconds to lose one.

ESA certainly lost one this weekend. After arriving at SQL Saturday, I realized I had left my shoes in my room at the hotel.  As soon as I got an opportunity I emailed them. I didn’t hear back right away, so I later called.  The response was less than stellar. First, they’d have to check with the housekeeper in question and they’d call me back. But additionally their policy was not to mail items to customers and in the event they did, they expected the customer to pay for shipping. Not the most customer friendly response, but I could deal with the shipping if they did in fact find my shoes.

No more response that day and I wasn’t about to drive 20 minutes in the opposite direction on the off-chance they had found my shoes because it wasn’t even clear the front desk would have access to them (since they couldn’t confirm anything until they spoke to the housekeeper in question.)

Sunday morning I woke up to an email which I will quote in its entirety:

We are unable to send these to you as our mail delivery does not pick up packages unless it is addressed for ESA business.

So, now at least the way I read this, it still doesn’t answer my question if they had even found them.

Finally last evening I spoke on the phone with the manager who kept reiterating their policy, but never said they had actually found them. I finally had to stop her and ask, “Do you even have them? You’ve never actually said that.” “Oh yes we do, but we can’t ship them to you.” “What if I pay for the shipping.” “We don’t do that.” Meanwhile she says repeatedly, “I’m doing everything I can help you.”

I’m still not sure how, “I can’t ship them to you” and “I’m doing everything I can to help you” jives.

But let’s just say, this whole experience has left a sour taste in my mouth.

Again a little effort can go a long way.

So, that’s my experience this weekend.  Some great people who will help each other and others who are willing to write off paying customers.

But, despite not being a very code heavy blog, I’m going to toss out this tidbit for future reference:

$sourceserver = ‘Myserver\sqlexpress’
$sourcedb = ‘Adventurework2014’
$outputdirectory = ‘c:\temp\’

 

$tables = invoke-sqlcmd -server $sourceserver -Database $sourcedb ‘select ss.name as schema_name, so.name as table_name, ss.name+”.”+so.name as full_name from sysobjects so inner join sys.schemas ss on ss.schema_id=so.uid where type=”u”’

ForEach ($table in $tables)
{
$bcpstring=”bcp $($sourcedb).$($table.full_name) out $outputdirectory[$($table.schema_name)].[$($table.table_name)].bcp -S $sourceserver -T -E -n”
#Write-Host $bcpstring
Invoke-Expression $bcpstring

}

It’s not much, but I had a recent need to dump out every table of a particular database for a client. So I wrote this.  BTW, by including the [] in the filenames, when I go to load this data, the QUOTENAME version of the schema.table is automatically used.