Security: Close isn’t good enough!

I was going to write about something else and just happened to see a tweet from Grant Fritchey that prompted a change in topics.

I’ve written in the past about good and bad password and security polices. And yes, often bad security can be worse than no security, but generally no security is the worst option of all.

Grant’s comment reminded me of two incidents I’ve been involved with over the years that didn’t end well for others.

In the first case, during the first dot-com bubble, I was asked to partake in the due diligence of a company we were looking to acquire. I expected to spend a lot of time on the project, but literally spent about 30 minutes before I sent an email saying it wasn’t worth going further.

Like all dot-com companies, they had a website. That is after all, sort of a requirement to be a dot-com. And it was obvious it was backed by a database server (which I knew was SQL Server, which sped up my process, but only by a few minutes). So, I did the obvious thing and got the IP address of the web-site. Then, I simply tried to connect to SQL Server from my desktop to that IP address and one or two on either side of it. On my second attempt, the IP address right before the one of the website replied to my attempt to reach SQL Server. That was not a good sign. The reply meant there was no effective firewall in place. Note, had they not been using SQL Server, but some other tech, it might have taken me another 10-15 minutes to find the right client to connect. So knowing it was SQL Server wasn’t overly important.

But of course at least they had a password right? Well, back then, the latest and greatest version of SQL Server was 2000 which still did not require a password when you set it up.  I asked myself, “it couldn’t be that easy could it?”

Sure enough it was. Within minutes I had logged in as sa without a password. I now had complete control of their SQL Server. But even more so, back then SQL Server allowed unfettered access to xp_cmdshell. In theory at that point I could have done anything I wanted on the box, including installing remote access software and creating and giving myself administrator access.  I didn’t. But, my email to my boss was short and sweet. I explained how there was absolutely no way we could acquire their platform without a complete top to bottom review of it for any signs of malware. If it took me only 30 minutes or less to get in, I was almost certain their system was owned.

We never acquired that company. I’ve wondered since then what happened to them. My guess is, like many dot-com companies they folded. I can’t say it would have been because of their lack of security, but I can say that the lack of security played a huge factor in us NOT acquiring them. (and for the record, the company I worked for at the time ended up acquiring 1-2 other companies, merging with a 3rd and finally being acquired by a 4th, which is still around. So we were doing something mostly right.)

The second incident that comes to mind was about 8 years later at another start-up. I was asked by the COO to do some due diligence on the setup in another division’s datacenter setup. Again, I didn’t do anything fancy. I knew they weren’t running SQL Server, but I figured I could still do some probing. This time what I found was a bit different. It wasn’t software per se, but rather their iSCSI switch. Sure enough not only did it have a public facing IP address, but, the CTO of that division had failed to change the default password. I was very tempted at the time to give the IP address to my 8 year old son, without any other details and asking him to try to log in. Given his skills, even at that age, I’m 99% sure he’d have figured how to Google the required information and get in. But I figured I didn’t really need to do that to make my point.

That and other factors later lead to the CTO leaving the company.

Moral of the story: Make sure your sensitive information is under some form of lock and key and don’t use blank or factory default passwords, let you or your company end up in a headline like this one: Evisort Data Exposed.

Design Thoughts

Ever look at a product and wonder, “why did they design it that way?” I know I have, and I have some examples I want to bring up.

Years ago, over dinner, I had a programmer from our Wisconsin office basically ask, “why the hell is your file system for your web servers setup the way it is?”  It was a fair question. It wasn’t something one would normally see.  But before I explain that…

Like any modern American, I’m physically incapable of being more than 10′ from a flat screen TV in my house.  We have several, including one in my office and one in the kitchen. I couldn’t tell you the brand of the one in the kitchen (well I could, but I’m too lazy to go downstairs and find out) and the only reason I can tell you the brand of the one in my office is because I can see it from here. It’s an Inginsia brand.

Both serve the same function: they allow me to watch TV. But both have design quirks.

Their button layouts are a bit different (note the layout of the numbers and the volume/channel control buttons.)

Kitchen TV Remote

Kitchen TV Remote

Office TV Remote

Office TV Remote

The kitchen TV also has a built-in DVD player, so it has additional controls for that.

So obviously, there’s different design philosophies and requirements here. But I want to go a step deeper and talk a bit about functionality.

The kitchen TV remote, if you mistype a number, you can hit the Vol – button and it will essentially backspace and delete the number. Actually a handy feature.  The Office remote has no such functionality, though hitting EXIT will remove the entire channel already entered.  Score one for Dynex. (Ok, I did go downstairs so I could grab the remote and take a photo).

But, the Dynex has one annoying quirk I’ve never figured out. When I hit the OFF button, there’s a noticeable delay of 1-2 seconds before it actually turns off. For the life of me, I have NO idea why. I mean I’m turning off a TV. It’s not like I’m shutting down a computer where it has to write the contents of memory to disk and perform other tasks. Sure, maybe it has to save the last channel I was tuned to, but it could do that right after I tuned into that station. Same with the volume.  Every other TV in the house, including my office one, when you hit the off button, turns off instantly.

I’m reminded a bit of early computers that had the big red switch. There was something satisfying about turning off an early PC. You knew it was instantly off. There was no two questions about it. Now, shutting off a PC is a far more complex operation and can take sometime.  But a TV? I’d love to know why the kitchen TV takes a long time to turn off.

Now back to the file design the programmer was asking me about. Essentially we had 5-6 web front ends, each with a virtual directory in IIS pointing to a NAS. Not an entirely awful setup, but uncommon at the time.  We were offering a web platform to newspapers so they could publish their content. Originally we tried using a 3rd party package to make sure the content on all the servers was always in synch (since a newspaper could upload content at any time to any of the servers and wanted it available instantly). What we found was sometimes we’d get into race conditions where files could actually end up erasing themselves. The 3rd party company kept assuring us they had the solution. Well after a desperate call at 4:00 AM call from my on-duty NOC person, I drove into the office, scrambling to figure out a better solution. On the drive, the idea of using the virtual directories to the NAS occurred to me. We implemented it in about 30 minutes and solved our problems. It was supposed to be a temporary solution until we came up with a more robust, permanent solution. But, 18 months later it was still in place, working great and I was explaining it to our out of town programmer. He went from, “that’s nuts” to “Hey, that makes a lot of sense.”

So, I like to think that when there’s a design I don’t understand, the designers at the time had their reasons. But, to be honest, I’m not always sure.

For example, the photo that should be heading up this article, of a shampoo bottle and a bottle of conditioner, both from the same manufacturer, both designed to be cap down, are printed the opposite way. The only reason I can think of that makes sense is so that in a befuddled, sleep deprived state, I can more easily determine which is which. But even if that is the answer, why this way, and not the other? Inquiring minds want to know!

And yes, the shampoo bottle can be placed cap up, but the conditioner bottle can’t be. Again, why? The viscosity of the two aren’t that different. Again, inquiring minds want to know.

Shampoo/Conditioner bottles

One of these is upside down!

 

Fail-safes

Dam it Jim, I’m a Doctor, not a civil engineer

I grew up near a small hydro-electric dam in CT. I was fascinated by it (and still am in many ways). One of the details I found interesting was that on top of this concrete structure they had what I later found are often called flashboards. These were 2x8s (perhaps a bit wider) running the length of the top of the dam, held in place by wooden supports.  The general idea was they increased the pooling depth by 8″ or so, but in the advent of a very heavy water flow or flood, they could be easily removed (in many cases removed simply by the force of the water itself).  They safely provided more water, but were designed in fact to fail (i.e. give away) in a safe and predictable manner.

This is an important detail that some designers of systems often don’t think about; how to fail. They spend so much time trying to PREVENT a failure, they don’t think about how the system will react in the EVENT of a failure. Properly designed systems assume that at some point failure IS an not only an option, it’s inevitable.

When I was first taught rigging for cave rescue, we were always taught “Have a mainline and a belay”.  The assumption is, that the system may fail. So, we spent a lot of time learning how to design a good belay system. The thinking has changed a bit these days, often we’re as likely to have TWO “mainlines” and switch between them, but the general concept is still the same, in the event of a failure EITHER line should be able to catch the load safely and be able to recover. (i.e. simply catching the fall but not being able to resume operations is insufficient.)

So, your systems. Do you think about failures and recovery?

Let me tell you about the one that prompted this post.  Years ago, for a client I built a log-shipping backup system for them. It uses SSH and other tools to get the files from their office to the corporate datacenter.  Because of the network setup, I can’t use the built-in SQL Server log-shipping copy commands.

But that’s not the real point. The real point is… “stuff happens”. Sometimes the network connection dies. Sometimes the copy hangs, or they reboot the server in the office in the middle of a copy, etc. Basically “things break”.

And, there’s another problem I have NOT been able to fix, that only started about 2 years ago (so for about 5 years it was not a problem.) Basically the SQL Server in the datacenter starts to have a memory leak and applying the log-files fails and I start to get errors.

Now, I HATE error emails. When this system fails, I can easily get like 60 an hour (every database, 4 times an hour plus a few other error emails). That’s annoying.

AND it was costing the customer every time I had to go in and fix things.

So, on the receiving side I setup a job to restart SQL Server and Agent every 12 hours (if we ever go into production we’ll have to solve the memory leak, but at this time we’ve decided it’s such a low priority as to not bother, and since it’s related to the log-shipping and if we failed over we’d be turning off log-shipping, it’s considered even less of an issue). This job comes in handy a bit later in the story.

Now, on the SENDING side, as I’ve said, sometimes the network would fail, they’d reboot in the middle of a copy or something random would make the copy job get “stuck”. This meant rather than simply failing, it would keep running, but not doing anything.

So, I eventually enabled a “deadman’s switch” in this job. If it runs for more than 12 hours, it will kill itself so that it can run normally again at the next scheduled time.

Now, here’s what often happens. The job will get stuck. I’ll start to get email alerts from the datacenter that it has been too long since logfiles have been applied. I’ll go in to the office server, kill the job and then manually run it. Then I’ll go into the datacenter, and make sure the jobs there are running.  It works and doesn’t take long. But, it takes time and I have to charge the customer.

So, this weekend…

the job on the office server got stuck. So I decided to test my failsafes/deadman switches.

I turned off SQL Agent in the datacenter, knowing that later that night my “cycle” job would turn it back on. This was simply so I wouldn’t get flooded with emails.

And, I left the stuck job in the office as is. I wanted to confirm the deadman’s switch would kick in and kill it and then restart it.

Sure enough later that day, the log files started flowing to the datacenter as expected.

Then a few hours later the SQL Agent in the datacenter started up again and log-shipping picked up where it left off.

So, basically I had an end to end test that when something breaks, on either end, the system can recover without human intervention. That’s pretty reassuring. I like knowing it’s that robust.

Failures Happen

And in this case… I’ve tested the system and it can handle them. That lets me sleep better at night.

Can your systems handle failure robustly?

 

 

Hours for the week

Like I say, I don’t generally post SQL specific stuff because, well there’s so many blogs out there that do. But what the heck.

Had a problem the other day. I needed to return the hours worked per timerange for a specific employee. And if they worked no hours, return 0.  So basically had to deal with gaps.

There’s lots of solutions out there, this is mine:

Alter procedure GetEmployeeHoursByDate @startdate date, @enddate date , @userID varchar(25)
as

— Usage exec GetEmployeeHoursByDate ‘2018-01-07’, ‘2018-01-13’, ‘gmoore’

— Author: Greg D. Moore
— Date: 2018-02-12
— Version: 1.0

— Get the totals for the days in question

 

 

set NOCOUNT on

— First let’s create simple table that just has the range of dates we want

; WITH daterange AS (
SELECT @startdate AS WorkDate
UNION ALL
SELECT DATEADD(dd, 1, WorkDate)
FROM daterange s
WHERE DATEADD(dd, 1, WorkDate) <= @enddate)

 

select dr.workdate as workdate, coalesce(a.dailyhours,0) as DailyHours from
(
— Here we get the hours worked and sum them up for that person.

select ph.WorkDate, sum(ph.Hours) as DailyHours from ProjectHours ph
where ph.UserID=@userid
and ph.workdate>= @startdate and ph.workdate <= @enddate
group by ph.workdate
) as a
right outer join daterange dr on dr.WorkDate=a.WorkDate — now join our table of dates to our hours and put in 0 for dates we don’t have hours for
order by workdate

GO

There’s probably better ways, but this worked for me. What’s your solution?

This is secure, right?

WP_20180114_001So, over the weekend, while I was waiting for my wife’s hockey game to begin I was exploring the facility. I saw an upper level and I was curious if it was open. But alas no, it was locked. And I mean locked. Above is a photo of the chain and lock used to keep people out.

There’s a saying that locks only keep out honest people. That’s not entirely true, but there’s some truth to that. Security, especially in my field of IT and databases is extremely important. I am often working with data that contains private information about people. I have to take steps to keep it safe. So, as I mentioned in Too Secure, I don’t have a problem with security per se. There I talked about security that prevented me from doing my job.

Here we have the opposite. Making something that look secure, but really isn’t. I mean this has a heavy chain and a lock. I’m not sure who did this or why, but I’m sure they felt the lock was important. It’s not a cheap one either.

 

This is a case where probably a simple sign and string, “Balcony closed” would have sufficed.  The only purpose the lock really had was to make sure the chain wasn’t stolen. The purpose of the chain appeared to be to give something to attach the lock to.

Now, sure, someone could have removed a sign and string and said, “oh we never saw it” but again, what’s the ultimate risk if someone got upstairs? It was simply an observation balcony. They weren’t really securing much.

But I’m sure someone felt good about their security.

So, when you’re making something secure, stop and make sure you’re spending your time wisely. Sometimes not everything needs the most secure system available. Sometimes a “keep out sign” might be enough and easier.

Starting off the New Year… wrong

Ok, I had plans for a great blog post on time and how we measure it and how it really is arbitrary. But… that will have to wait. (I suppose that brings its own irony about time and waiting and all that. Hmm, now I may need a post about that!)

But, circumstances suggested a different post.

This one started with an email from my boss last year (who will remain nameless for his sake. 🙂  “Greg, I hacked together this web page so my team-members can track their hours. I need you to make a few tweaks to it. And don’t worry, we’ll replace it by the end of the year.”  Yes, that sound you hear is your head hitting the desk like mine, because you KNOW what is coming next. Here we are in January of 2018 and guess what, the web page is still in place.

And of course, it’s not just HIS (now former, he’s moved on) team that is using it. Apparently his boss liked what it could do, so his boss (my boss^2) declared that ALL his teams would use it.  So instead of just 4-5 people using it, there’s close to 30-40 people using it.

So, remember the first rule of development, “Oh don’t worry about this, we’ll replace it soon” never happens.  His original code made a lot of assumptions (which at the time I suppose seemed reasonable). For example, since many schedules are formed around work-weeks, he was originally storing hours by the week (and then day of the week) they were worked in. For example, if someone worked for 6 hours on January 5th of last year, the table stored that as “WEEK 1, day 5”.

Now, before anyone completely thinks this is nuts, do keep in mind, many places, including my last job at GE did everything by Week of the Year.  There’s some advantages to this (that I won’t go into here).

For the most part, the code worked and I didn’t care. But, at one point I was asked to do a report based not on Work Weeks, but on an actual calendar month. So, I had to figure out for example that February 1st occurred during Work Week 5, on the 4th day. And go from there.

So, I hacked that together.  Then where was a request for another report and another. Pretty soon it was getting to be a pain to keep track of all this. And I realized another problem. My boss had gotten lucky in 2017. January 1st occurred on a Sunday, which meant that Work Week 1, Day 1 was a natural starting point.

I started to worry about what would happen when 2018 rolled around. His code would probably break.  So I finally took the plunge and started to refactor the code.  I also added new features as they were requested.  Things were going great.

Until yesterday.  So now we get to another good rule of development: “Use Version Control” So simple. Even for a one person shop like me it’s a good idea. I had put it off for awhile, but finally downloaded the git plug-in for Visual Studio and started to use it.

So yesterday, a user reported that they couldn’t enter hours for last week. I pulled up the app and realized that yes, in fact it was coded in such a way you couldn’t go back into a previous year to enter hours. No problem, I can fix that!

Well, let me add a rule I had sort of missed; “Understand HOW to use Version Control”. I won’t go into details, but let’s just say, I wasn’t committing like I should have been or thought I was. So, in an attempt to get a clean base and all, I merged things… “poorly“. I had the branches I wanted, but had not properly committed stuff to them.

The work of the last few weeks was GONE. I know, you’re saying, “just go to your backups Greg, because of course a person who writes about DR has proper backups, right?”  Yeah, right! Let’s not go there!

Anyway, I spent 4 hours last night and 2 this morning recreating the code. Fortunately, it dawned on me, being .NET code, it was really just CLR and that perhaps with a good decompiler, I could get most of the code back without too much effort. I want to give props to RedGate’s .NET Reflector. Of the two decompilers I tried, it was clearly the better one. While I lost my variable names and the decompiled code isn’t quite what I’d call human written, it was close enough I could in short order recreate hours and hours of work I had done.

In the meantime, I also talked to some of my programming buddies on an RPI chat server (ask me about it sometime) about git and better procedures.

And here’s where I realized my fundamental mistake. It wasn’t just misunderstanding how Visual Studio was interacting with git and the branches I was creating, it was that somewhere in the back of my head I had decided that commits should be used only when major steps were completed. It was almost like I was afraid I’d run out; or perhaps complicate the history with too many of them.  But, you know what? Commits aren’t scarce resources. And I’m the only one reading the history. I don’t really care if I’ve got 10 commits or 100. The more the better in many ways. Had I been making commits much more frequently, I’d have saved myself a lot of work. (And I can discuss having a remote store at some point, etc.)

So really, the point of this hastily, sleep-deprived written blog isn’t so much to talk about dates and apps that never get replaced, but about a much deeper problem I was having. It wasn’t just failing to fully understand git commands, it was failing to understand how I should be using git in the first place.

So, to riff on an old phrase, “Commit Early, Commit often”.

Oh and I did hack together a way for that user to enter hours for last year. It’s good enough. I’m sure I won’t need it a year from now. The code will be all replaced or refactored by then I’m sure!

 

Too Secure

There’s an old joke in IT that the Security Office’s job isn’t done until you can’t do yours.

There’s unfortunately at times some truth to that.  And it can be a bigger problem than you might initially think.

A recent example comes to mind. I have one client that has setup fairly strict security precautions. I’m generally in favor of most of them, even if at times they’re inconvenient. But recently, they made some changes that were, frustrating to say the least and potentially problematic.  Let me explain.

Basically, at times I have to transfer a file created on a secured VM I control to one of their servers (that in theory is a sandbox in their environment that I can play in). Now, I obviously can’t just cut and paste it. Or perhaps that’s not so obvious, but yeah, for various reasons, through their VDI, they have C&P disabled. I’m ok with that. It does lessen the chance of someone accidentally cutting and pasting the wrong file to the wrong machine.

So what I previously did was something that seemed strange, but worked. I’d email the file to myself and then open a browser session on the said machine and get the file there. Not ideal and I’ll admit there are security implications, but it does cause the file to get virus scanned at at least two places and I’m very unlikely to send myself a dangerous file.

Now, for my webclient on this machine, I tended to use Firefox. It was kept up to date and as far as I know, up until recently, on their approved list of programs.  Great. This worked for well over a year.

Then, one day last week, I go to the server in question and there’s no Firefox. I realized this was related to an email I had seen earlier in the week about their security team removing Firefox from a different server, “for security reasons”. Now arguably that server didn’t need Firefox, but still, my server was technically MY sandbox. So, I was stuck with IE. Yes, their security team thinks IE is more secure than Firefox.  Ok, no problem I’ll use IE.

I go ahead, enter my userid and supersecret password. Nothing happens. Try a few times since maybe I got the password wrong. Nope. Nothing.  So I tried something different to confirm my theory and get the dreaded, “Your browser does not support cookies” error. Aha, now I’m on to something.

I jump into the settings and try several different things to enable cookies completely. I figure I can return things to the way they want after I get my file. No joy. Despite enabling every applicable options, it wouldn’t override the domain settings and cookies remained disabled.  ARGH.

So, next I figured I’d re-download FF and use that. It’s my box after all (in theory).

I get the install downloaded, click on it and it starts to install. Great! What was supposed to be a 5 minute problem of getting the file I needed to the server is about done. It’s only taken me an hour or two, but I can smell success.

Well, turns out what I was smelling was more frustration. Half-way through the install it locks up. I kill the process and go back to the file I downloaded and try again. BUT, the file isn’t there. I realize after some digging that their security software is automatically deleting certain downloads, such as the Firefox install.

So I’m back to dead in the water.

I know, I’ll try to use Dropbox or OneDrive. But… both require cookies to get setup.  So much for that.

I’ve now spend close to 3 hours trying to get this file to their server.  I was at a loss as to how to solve this. So I did what I often do in situations like this. I jumped in the shower to think.

Now, I finally DID manage to find a way, but I’m actually not going to mention it here. The how isn’t important (though keeping the details private are probably at least a bit important.)

Anyway, here’s the thing. I agree with trying to make servers secure. We in IT have too many data breaches as it is. BUT, there is definitely a problem with making things TOO secure. Actually two problems. The first is the old joke about how a computer encased in cement at the bottom of the ocean is extremely secure. But also unusable.  So, their security measures almost got us to the state of making an extremely secure  but useless computer.

But the other problem is more subtle. If you make things too secure, your users are going to do what they can to bypass your security in order to get their job done. They’re not trying to be malicious, but they may end up making things MORE risky by enabling services that shouldn’t be installed or by installing software you didn’t authorize, thus leaving you in an unknown security state (for the record, I didn’t do either of the above.)

Also, I find it frustrating when steps like the above are taken, but some of the servers in their environment don’t have the latest service packs or security fixes. So, they’re fixing surface issues, but ignoring deeper problems. While I was “nice” in what I did; i.e. I technically didn’t violate any of their security measures in the end, I did work to bypass them. A true hacker most likely isn’t going to be nice. They’re going to go for the gold and go through one of at least a dozen unpatched security holes to gain control of the system in question. So as much as I can live with their security precautions of locking down certain software, I’d also like to see them actually patch the machines.

So, security is important, but let’s not make it so tight people go to extremes to by pass it.