Too Secure 2

A quick followup to my blog post from the other day.

So, today I tried to update a service at the client. But of course, with IE locked down and cookies not allowed, I can’t update the service. Hmm. Tell me how that’s more secure?

And my wife just came back from work last night, talking about how she’s no longer able to get to a website critical for her job; because the firewall rules changed.  All this in the name of security.

Yes, we can be too secure!

Too Secure

There’s an old joke in IT that the Security Office’s job isn’t done until you can’t do yours.

There’s unfortunately at times some truth to that.  And it can be a bigger problem than you might initially think.

A recent example comes to mind. I have one client that has setup fairly strict security precautions. I’m generally in favor of most of them, even if at times they’re inconvenient. But recently, they made some changes that were, frustrating to say the least and potentially problematic.  Let me explain.

Basically, at times I have to transfer a file created on a secured VM I control to one of their servers (that in theory is a sandbox in their environment that I can play in). Now, I obviously can’t just cut and paste it. Or perhaps that’s not so obvious, but yeah, for various reasons, through their VDI, they have C&P disabled. I’m ok with that. It does lessen the chance of someone accidentally cutting and pasting the wrong file to the wrong machine.

So what I previously did was something that seemed strange, but worked. I’d email the file to myself and then open a browser session on the said machine and get the file there. Not ideal and I’ll admit there are security implications, but it does cause the file to get virus scanned at at least two places and I’m very unlikely to send myself a dangerous file.

Now, for my webclient on this machine, I tended to use Firefox. It was kept up to date and as far as I know, up until recently, on their approved list of programs.  Great. This worked for well over a year.

Then, one day last week, I go to the server in question and there’s no Firefox. I realized this was related to an email I had seen earlier in the week about their security team removing Firefox from a different server, “for security reasons”. Now arguably that server didn’t need Firefox, but still, my server was technically MY sandbox. So, I was stuck with IE. Yes, their security team thinks IE is more secure than Firefox.  Ok, no problem I’ll use IE.

I go ahead, enter my userid and supersecret password. Nothing happens. Try a few times since maybe I got the password wrong. Nope. Nothing.  So I tried something different to confirm my theory and get the dreaded, “Your browser does not support cookies” error. Aha, now I’m on to something.

I jump into the settings and try several different things to enable cookies completely. I figure I can return things to the way they want after I get my file. No joy. Despite enabling every applicable options, it wouldn’t override the domain settings and cookies remained disabled.  ARGH.

So, next I figured I’d re-download FF and use that. It’s my box after all (in theory).

I get the install downloaded, click on it and it starts to install. Great! What was supposed to be a 5 minute problem of getting the file I needed to the server is about done. It’s only taken me an hour or two, but I can smell success.

Well, turns out what I was smelling was more frustration. Half-way through the install it locks up. I kill the process and go back to the file I downloaded and try again. BUT, the file isn’t there. I realize after some digging that their security software is automatically deleting certain downloads, such as the Firefox install.

So I’m back to dead in the water.

I know, I’ll try to use Dropbox or OneDrive. But… both require cookies to get setup.  So much for that.

I’ve now spend close to 3 hours trying to get this file to their server.  I was at a loss as to how to solve this. So I did what I often do in situations like this. I jumped in the shower to think.

Now, I finally DID manage to find a way, but I’m actually not going to mention it here. The how isn’t important (though keeping the details private are probably at least a bit important.)

Anyway, here’s the thing. I agree with trying to make servers secure. We in IT have too many data breaches as it is. BUT, there is definitely a problem with making things TOO secure. Actually two problems. The first is the old joke about how a computer encased in cement at the bottom of the ocean is extremely secure. But also unusable.  So, their security measures almost got us to the state of making an extremely secure  but useless computer.

But the other problem is more subtle. If you make things too secure, your users are going to do what they can to bypass your security in order to get their job done. They’re not trying to be malicious, but they may end up making things MORE risky by enabling services that shouldn’t be installed or by installing software you didn’t authorize, thus leaving you in an unknown security state (for the record, I didn’t do either of the above.)

Also, I find it frustrating when steps like the above are taken, but some of the servers in their environment don’t have the latest service packs or security fixes. So, they’re fixing surface issues, but ignoring deeper problems. While I was “nice” in what I did; i.e. I technically didn’t violate any of their security measures in the end, I did work to bypass them. A true hacker most likely isn’t going to be nice. They’re going to go for the gold and go through one of at least a dozen unpatched security holes to gain control of the system in question. So as much as I can live with their security precautions of locking down certain software, I’d also like to see them actually patch the machines.

So, security is important, but let’s not make it so tight people go to extremes to by pass it.

 

She’s smart and good looking.

Now, if you work from home like I do, this exercise won’t really work, but if you work in an office, look around at your coworkers and start to notice what gender they present as. Most likely you’ll notice a lot of men and a few women.

Sexism is alive and well in the tech world. Unfortunately.

We hear a lot about efforts (which I support by the way) like Girls and Data and Girls Who Code. These are great attempts at addressing some of the gender issues in the industry.  We’ve probably all heard about the “Google Manifesto” (and no, I’m not linking to it, since most of the “science” in it is complete crap and I don’t want to give it any more viewership than it has had. But here’s a link to the problems with it.)

We know that grammar school and middle girls have a strong interest in the STEM field. And yet, by the time college graduation rolls around, we have a disproportionately smaller number of them in the computer sciences for example.  So the above attempts to keep them interested help, but honestly only address part of the problem.

The other side is us men.  Yes, us.  We can tell our daughters all day long, “you’re smart, you can program”.  “You too can be a DBA!” and more. But what do we tell our sons?  We need to tell the that women can program. We should be telling them about Ada Lovelace and Admiral Grace Hopper. We should be making sure they realize that boys aren’t inherently better at STEM then girls.  We should be making sure they recognize their own language and actions have an impact.

What do we do ourselves when it comes to the office environment? Do we talk too much? Evidence suggests we do.

Do we subconsciously ignore the suggestions of our female coworkers or perhaps subconsciously give more support or credence to the suggestions of our male coworkers?  While I can’t find a cite right now, again evidence again suggests we do.

Who is represented at meetings?  Are they a good ol’ boys network?  Who do we lunch with, both at work and when we network?

If you’re a member of a user group that has speakers, what does the ratio of speakers look like to you? Do they reflect groups ratio? Do they reflect the ratio of the industry?

I think it’s great that we have programs such as Girls who Code and Girls and Data, but we as men have to work on ourselves and work on our actions and reactions.

Some suggestions: “Sometimes, simply shut up.” I’ve started to do this more, especially if I’m in a group of women. LISTEN. And you know what, if you’re thinking right now, “well duh… because women talk so much I’d never get a word in anyway” you’re falling victim to the cliches and perpetuating the problem.

Support the women you work with. If they have a good idea, make sure it gets the same discussion as other ideas. And if one of your coworkers tries to co-opt it as their own, call them on it.  If you have a coworker (and I’ve had these) that is continually cutting off women in meetings, call them on it.

Seek out women speakers for your user groups. I’d suggest for example Rie Irish and her talk “Let her Finish”.  I asked Rie to speak at our local user group. Partly because of serendipity (I contacted one of our women members to let her know about the talk) we got the local Women in Technology group to advertise our meeting and ended up with a number of new members.

And finally, the title. Watch your language. Unless you’re working at a modelling agency or similar, you probably should never be introducing a coworker as “She’s smart and good looking.”  Think about it, would you ever introduce a male coworker as “He’s a great DBA and handsome too boot!”  Your coworkers, male or female are just that, coworkers in a professional setting, treat them as such.

Two final thoughts:

  1. If somehow this blog post has impacted you more than the brilliant posts of Rie Irish, Mindy Curnutt, or others who have spoken on sexism in the industry, I’d suggest you examine your biases, not give credit to my writing.
  2. If you have suggestions for women speakers for my local user group, especially local ones who can make the second Monday of the month, please let me know.

 

 

 

 

Comfort Zone

Humans are by nature, a creature of habit and familiarity. We’ll often go to the same restaurant time after time, not necessarily because it’s the best, but because we’re most familiar with it. One reason why McDonald’s is so popular is NOT because they serve the best hamburgers, but because you’re pretty comfortable, no matter where you go, knowing that you’ll get exactly the same hamburger every time.

However, if you never have anything other than McDonald’s you can miss out on some wonderful food.

I often try to get out of my comfort zone. Sometimes we have to do so to grow. Of course everyone’s comfort zone is different. I love to crawl through holes in the ground (and please, keep it simple, we call it caving, not spelunking.) To me, that’s a comfortable environment.

But recently I’ve been doing something outside of my comfort zone; I’ve been taking a sales training class. The truth is, being a consultant, as much as I love the tech side, I really need to sell myself. Sales IS part of what I need to do. And I’m not comfortable doing it.

But, to expand I have to learn how. And I have to admit, I’ve learned a lot. It’s been worth it.

Another thing I’m doing to step a wee bit out of my comfort zone is to schedule a weekly blog post. Rather than do it hit or miss, I’m going to try to make it more formal.

So, what have you done to step out of your comfort zone lately?  How has it worked for you?

Oh and if you’re ever in upstate New York and want to go caving, let me know.

Riffing on a theme

As I’ve mentioned in the past, I often write these as inspiration strikes and today it did.

Specifically I’m inspired by Grant Fritchey’s latest blog post THERE IS A MAGIC BUTTON, A RANT.

First, I couldn’t resist doing this simply because I could put a pun in the title. (Read his entire article to see what I mean.)

But more so, because it’s part of a theme I’ve heard for decades. “This new technology will put people out of jobs!”

The truth is, sometimes it’s true.  I mean, how many buggy-whip manufacturers do you see these days? How many do you think existed before Ford started rolling the Model T off of the assembly line? How many after?

Yes, the Model T put many buggy-whip makers out of jobs. BUT, it created far more jobs than it eliminated.

Automated elevators put out elevator operators out of a job. But you know what, for the most part, that’s a good thing. Let’s use our human capital in a better, wiser way.

For decades I’ve heard that one technology or another will make SQL useless or pointless.  At one point it was object oriented databases. Now it’s NoSQL. But you know what, SQL is not only still here, it’s thriving and now often when folks use the term NoSQL instead of meaning NO SQL, they’re using it as a short-hand for Not ONLY SQL.

So, performance tuning automation? Great. I love it. Bring it on. It WILL in fact mean less work I have to do on that front in many cases. But you know what, it won’t fix the situation I’m in right now where a customer has a server they use for “database conversions”. The problems include databases still in FULL Recovery mode, but no log backups, DBCCs not being done in weeks or months on some of these and the tempDB log filling up the drive on Saturday while I was sitting in a talk at the Albany SQL Saturday.

Yes, at some point most of those issues will probably be handled automatically, but until they do, I’ll be busy. And when they DO automate those, I’ll have moved on to new issues.

There’s always a place for trained people.

Since I like to link my topics to other ideas like plane crashes, I’ll point out that autopilots on modern commercial airliners are amazing things. They really can pretty much handle any part of normal flight operations including take-offs and landings. But, what they can’t handle is the unexpected. And this is actually an issue in two ways. First of course, is the fact that it took human ingenuity to safely land the Miracle on the Hudson.  There’s no autopilot out there that could make that decision and pull it off.

The second, and this is actually a bigger issue automated cars are facing is that with the use of an autopilot/self-driving car, 99.99% of the time, operations become SO mundane the pilot or “driver” ends up out of the loop. They end up reading, falling asleep or simply not paying attention. This means that when they ARE required to interact, there can be a several second delay before they’re fulling aware of the situation and can react. In a plane this may or may not be an issue depending on the altitude the situation occurs at.

In a self-driving car, we’re already seeing situations were the “driver” can’t get back into the control loop fast enough and an accident occurs.

So, while automation can eliminate a lot of the drudgery and “take away jobs” we still need humans in the loop and there is no foreseeable end to the jobs we’ll be needed to do.

So don’t despair about automation.

 

Don’t Break the Chain!

If one backup is good, two is better right?

Not always.

Let me start by saying I’ve often been very skeptical of SQL Server backups done by 3rd party tools. There’s really two reasons. For one, many years ago (when I first started working with SQL Server) they often simply weren’t good. They had issues with consistency and the like. Over time and with the advent of services like VSS, that issue is now moot (though, I’ll admit old habits die hard).

The second reason was I hate to rely on things that I don’t have complete control over. As a DBA, I feel it’s my responsibility to make sure backups are done correctly AND are usable. If I’m not completely in the loop, I get nervous.

Recently, a friend had a problem that brought this issue to light. He was asked to go through their SQL Server backups to find the time period when a particular record was deleted so they could develop a plan for restoring the data deleted in the primary table and in the subsequent cascaded deletes. Nothing too out of the ordinary. A bit tedious, but nothing too terrible.

So, he did what any DBA would do, he restored the full backup of the database for the date in question. Then he found the first transaction log and restored that.  Then he tried to restore the second transaction log.

The log in this backup set begins at LSN 90800000000023300001,  which is too recent to apply to the database. An earlier log backup that  includes LSN 90800000000016600001 can be restored.

Huh? Yeah, apparently there’s a missing log.  He looks at his scheduled tasks. Nope, nothing scheduled. He looks at the filesystem.  Nope, no files there.

He tries a couple of different things, but nope, there’s definitely a missing file.  Anyone who knows anything about SQL Server backups, knows that you can’t break the chain. If you do, you can’t get too far. This can work both ways. I once heard of a situation where the FULL backups weren’t recoverable, but they were able to create a new empty database and apply five years worth of transaction logs. Yes, 5 years worth.

This was the opposite case. They had the full backup they wanted, but couldn’t restore even 5 hours worth of logs.

So where was that missing transaction log backup?

My friend did some more digging in the backup history files in the MSDB and found this tidbit:

backup_start_date backup_finish_date first_lsn last_lsn physical_device_name
11/9/2016 0:34 11/9/2016 0:34 90800000000016600000 90800000000023300000 NUL

There was the missing transaction backup.  It was a few minutes after the full backup, and definitely not part of the scheduled backups he had setup.  The best he can figure is the sysadmin had set SAN Snapshot software to take a full backup at midnight and then for some reason a transaction log backup just minutes later.

That would have been fine, except for one critical detail. See that rightmost column (partly cut-off)? Yes, ‘physical_device_name’. It’s set to NUL.  So the missing backup wasn’t made to tape or another spot on the disk or anyplace like that. It was sent to the great bit-bucket in the sky. In other words, my friend was SOL, simply out of luck.

Now, fortunately, the original incident, while a problem for his office, wasn’t a major business stopping incident. And while he can’t fix the original problem he was facing, he discovered the issues with his backup procedures long before a major incident did occurr.

I’m writing about this incident for a couple of reasons.  For one, it emphasizes why I feel so strongly about realistic DR tests.  Don’t just write your plan down. Do it once in awhile. Make it as realistic as it can be.

BTW, one of my favorite tricks that I use for multiple reasons is to setup log-shipping to a 2nd server.  Even if the 2nd server can never be used for production because it may lack the performance, you’ll know very quickly if your chain is broken.

Also, I thought this was a great example of where doing things twice doesn’t necessarily make things less resistant to disaster. Yes, had this been setup properly it would have resulted in two separate, full backups being taken, in two separate places. That would have been better. But because of a very simple mistake, the setup was worse than if only one backup had been written.

I’d like to plug my book: IT Disaster Response due out in a bit over a month. Pre-order now!

Deep Drilling

I was reviewing the job history on one of the DR servers of a client of mine. I noticed something funny. The last job recorded in the job history table (msdb.dbo.sysjobhistory for those playing along at home) was recorded in January of this year.

But jobs were still running. It took me awhile to track it down, but through some sleuthing I solved the problem. First, I thought the msdb database might have filled up (though that event should have generated an error I’d have seen).  Nope.

Then I thought perhaps the table itself was full somehow. Nope, only about 32,000 records.  No luck.

I finally tried to run sp_sqlagent_log_jobhistory manually with some made up job information.

Msg 8115, Level 16, State 1, Procedure sp_sqlagent_log_jobhistory, Line 99
Arithmetic overflow error converting IDENTITY to data type int.
Arithmetic overflow occurred.

Now we’re getting someplace.  After a minor diversion of my own doing I then ran

DBCC CheckIDENT('sysjobhistory',NORESEED)

This returned a value of 2147483647. Hmm, that number looks VERY suspicious. A quick check of Books Online confirmed that’s the max value of a signed int.

Now, the simple solution, which worked for me in this case was to issue a

truncate table sysjobhistory

This removed all the rows in the table AND reset the IDENTITY value. Normally I’d hate to lose history information, but since this was 6 months old and seriously out of data it was acceptable. I could have merely reset the IDENTITY seed value, but there’s no guarantee I would not have then had collisions within the table later on. So this was the safest solution.

But wait, there was more. It kept bugging me that I had somehow reached the 2 BILLION row limit for this table. Sure, it handles log-shipping for about a dozen databases and as a result does about 48 jobs an hour, plus other jobs.  But for a year that should generate less than 1 million rows.  This database server hasn’t been running for 2 thousand years.

So, I decided to monitor things a bit and wait for a few jobs to run.

Then, I executed the following query.

select max(instance_id) from sysjobhistory

This returned a value along the lines of 232031.  Somehow, in the space of an hour or less, my sysjobhistory IDENTITY column had increased by over 232,000. This made no sense. But it did explain how I hit 2 billion rows!

So I started looking at the sysjobhistory table in detail. And I noticed gaps. Some make sense (if a job has multiple steps, it may temporarily insert a row and then roll it back once the job is done and put in a job completion record, and with the way IDENTITY columns work, this explains some small gaps). For example, there was a gap in instance_id from 868 to 875. Ok that didn’t bother me. BUT, the next value after 875 was 6,602. That was a huge gap! Then I saw a gap from 6,819 to 56,692. Another huge gap. As the movie says, “Something strange was going on in the neighborhood”.

I did a bit more drilling and found 3 jobs that were handling log-shipping from a particular server were showing HUGE amounts of history. Drilling deeper, I found they were generating errors, “Could not delete log file….”. Sure enough I went to the directories where the files were stored and there were log files going back to November.  Each directory had close to 22,000 log files that should have been deleted and weren’t.

Now I was closer to an answer. Back in November we had had issues with this server and I had to do a partial rebuild of it. And back then I had had some other issues related to log-shipping and permissions. I first checked permissions, but everything seemed fine.

I then decided to check attributes and sure enough all these files (based on the subdirectory attribute setting) had the R (readonly) value set. No wonder they couldn’t be deleted.

Now I’m trying to figure out how they got their attribute values set to R. (This is a non-traditional log-shipping setup, so it doesn’t use the built in SQL Server tools to copy the files. It uses rsync to copy files through an SSH tunnel).

So the mystery isn’t fully solved. It won’t be until I understand why they had an R value and if it will happen again.  That particular issue I’m still drilling into. But at least now I know why I hit the 2 billion row limit in my history table.

But, this is a good example of why it’s necessary to follow through an error to its root cause. All too often as an IT manager I’ve seen people who reported to me fix the final issue, but not the root cause. Had I done that here, i.e. simply cleared the history and reset the IDENTITY value, I’d have faced the same problem again a few weeks or months from now.

Moral of the story: When troubleshooting, it’s almost always worth taking the time to figure out not just what happened and fixing that, but WHY it happened and preventing it from happening again.