Ok, perhaps this isn’t Buckland, but alerts can be important.
I’ve been wanting to write something on alerts for quite awhile, and this isn’t quite it. Rather I’ll reference another URL on alerting. This sums up quite well much of what I’ve wanted to say for awhile.
To single out one important rule: if you’re going to get an alert, be prepared to act upon it!
Knowing that you pegged your server CPU at 100% every once in a while might be useful, but probably not something to wake people up about. And if it is hitting 100% infrequently, there’s probably nothing worth doing. On the other hand, if it’s routinely hitting 100% CPU, perhaps your action plan is to spin up another web server, or move load to a different database. Or, perhaps your plan is even to do nothing. But, planning to do nothing and accepting that, is very different from not planning and simply do nothing because you have no idea of waht to do.
Note, alerting is very different from monitoring and logging. If my CPU is hitting 100% once a week for 5 seconds, and then twice a week for 6 seconds, and then 4-5 times a day, for 10 seconds, I want to start making plans. But again, I probably don’t want to wake someone up.
Monitor, yes. Alert: maybe.
That’s it for tonight.