Art of the DBA Rotating Header Image

Survival Monitoring

The IT world is a jungle. Countless threats lurk like predators, ready to devour us if we’re not careful. Seemingly benign events can quickly turn in to raging panic fests and danger lives everywhere that we’re not looking. To survive you need to be prepared and proactive. Unfortunately, many of us are thrown into this with little more than a pat on the back and a smile, as if we just got dropped out of a helicopter into the African jungle with little more than a pack of chewing gum and a pocket knife. Yet, we need to survive not just the next day, but the weeks and months ahead of us.

Because we need to survive, there’s some basic stuff we need to focus on in order to ensure our survival. It’s not everything we need to live happy, contented careers, but the minimum elements we need to watch in our environments to make sure we live to see the next day. If we were trapped in the wilderness, we’d first focus on shelter, fire, and food. In database terms, we need to first keep an eye on backups, services, and disk space if we want to make it to the next day.

Shelter from the storm

The most important item in a DBA’s life is backups. We can have screaming disk, tons of CPU, and all sorts of clustering, but that means nothing if our files get corrupted or the building burns down. Just like shelter in the wilderness is a place where we can always find protection, our database backups will always give us something to recover to.

Keep an eye on three things when it comes to your backups. First, make sure they’re actually occurring. Look to the backupset table in msdb for this, because it will tell you exactly when your backups are occurring, whether they’re log backups, fulls, or differentials. Next, where are your backups located? Backups won’t do much for you if they’re stored to the local computer and then that computer’s hard drive burns up or gets corrupted. Make sure that your backup files get to another location. Finally, make sure your backups work. Just because you take a backup doesn’t always mean that backup is reliable. Perform restores when it’s not an emergency to validate your backups, so you’ll know things will work when it’s an emergency.

Backups are your safety net. No matter what else happens, you should always have them to fall back on. It may not be pretty, but you’ll be glad they’re there when you need them.

Give me fuel, give me fire

Fire gives us the energy to get things done, whether it’s keeping us warm or being used to cook food. This is the same with your SQL Server services. If these aren’t running, your databases are down and your company is losing money. We can’t always prevent the interruption, but we need to be ready to respond when that interruption occurs. As DBAs we need to be proactive and watch our services.

Also, we can’t limit this just to the SQL Server database service. How many of you run SQL Agent jobs to perform your backups and maintenance? I know I do. If the Agent service is down, the databases will be working fine, but none of that other work is getting done. To boot, you probably won’t be getting any notifications about these jobs not running, so this will be one big blind spot.

We can’t take on faith that services will start automatically. Sometimes they stop for completely legitimate reasons. It’s our job to make sure they’re up and running and very few things are worse than that surprise call about something not running because a SQL related service is down (one thing that is worse is that we don’t have a backup, see above). Watch your services and you’ll sleep better and warmer.

How can you have any pudding if you don’t eat your meat?!

People got to eat. Once we have a place to sleep and fire to keep us warm, this is the next thing that we need to keep us going. For databases, this food is disk space. We could expand it out to CPU and RAM, but I’ve seen many a server that will limp along when these are consumed and stop stone cold when a file can’t grow anymore because the server ran out of space. If we want a happy database, we need to keep our database fed.

Primarily, watch the free space on your drives (wherever your files are stored), but also keep an eye on the free space within your files. You need to know when your files are going to grow and consume your space. The immediate survival goal is to make sure your server has enough disk to keep running, but you also need to monitor how that disk is getting so that you can be ready to add disk as necessary.

Getting by

Please note, doing all of the above doesn’t guarantee that your server will hum along happily. This isn’t happiness, this is survival. This is the bare minimum you want to do to ensure your company’s service and data. That’s the thing about monitoring: there are hundreds of counters and statistics you can watch, it’s up to you to figure out which of those are important. That’s why you want to start with the fundamentals first, or you could be putting your data, your job, and your company at risk.

I wanted to start with the overview of this strategy. Stay tuned my next post (might be next week or next Thursday, depends on my schedule), I will actually cover some technical solutions to this monitoring, some SQL and PowerShell scripts you can use to keep an eye on all of the above. If you want to get a head start, take a look at my post on backupset or look at Brent Ozar’s(b|t) sp_blitz.

 

Leave a Reply

Your email address will not be published. Required fields are marked *