Art of the DBA Rotating Header Image

maintenance

Monitoring – Where to start?

It’s real easy in the tech industry to jump in and start slinging code or writing configs.  Most of us learn this way, throwing technological spaghetti up against the wall and seeing what stick.  It’s not what we want to do, though, when we’re looking to put something long lasting in place.  We need a plan for understanding and measuring the health of our SQL Servers, something we can take to the check writers to show them how we’re doing and give them measurable indicators of success.  This is where Service Level Agreements(SLAs) come in.

Ah, you were afraid I was going to say that?  I know for some people, “SLAs” is as bad as any other four letter word.  I hear it all the time “I’ve been doing for a long time, I know what I need to do.”  And it’s true, much of our work comes naturally: how much space do we have on disk, what’s our index fragmentation, is SQL Server up or down, etc.  The SLA, however, is what we can use to not only communicate to our users what we’re watching, but also help control things so we’re not waking up in the middle of the night because a developer dropped a table.

SQL Server high (level)

We need to keep in mind that most of our management doesn’t care about DBCC checks and backup strategies, they just want to know that the database is up and working.  We’ll take care of the details.  Our SLAs provide a framework that helps our users define operational standards without getting lost in the weeds.  My approach is to outline this by breaking my SLA structure into 3 general areas:

Availability: This is the easy one, are your databases or servers up and available?  When we define this SLA, we’re telling our users that their data will be available when they need it.  Some of the elements that fall under this are:

  • Is a database available 24/7 or only during business hours?
  • Do we need high availability?  What sort of service interruption can the database support?
  • In the event of a disaster, how much data will we lose (Recovery Point Objective) and how long will we be down (Recovery Time Objective)?
  • What sort of resource overhead, such as the free space on disk necessary keep the database up and running?
  • When will the database be down for maintenance and how long can that maintenance be?

Performance: Now we get into a little grey area, because we’re tell our users how well the system will perform.  It’s tough to put any parameters around “the database is slow”, but we can put some hard numbers around this, primarily by looking at the server health metrics.  Some things to consider are:

  • Acceptable query response times for standardized processes.
  • Thresholds for CPU, memory, and disk I/O.
  • Blocking process monitoring and resolution.
  • Index and Statistics maintenance.
  • Batch requests per second.

Support: And now it gets even fuzzier.  There’s always going to be drive-bys and break fix requests.  Whether or not you have a service ticket system or some other method for tracking database team requests, you’ll want to define your response and resolution times as well as work with your customers to determine what’s acceptable from a customer service perspective.  Also, if you have custom code or application procedures, you’ll probably want to have documented what level of response items like failed SQL job, ETL loads, and similar processes will get.

Not being all things

The next thing to define is your tiers of service.  Just like you shouldn’t be all things to all users, you won’t be able to support all your applications in the same way.  By defining your different application types, you can set reasonable expectations for your support.  This area is fairly open, because you might have only two types of applications, or you might have four or five.  A sample breakdown might look like this:

  • Critical 24X7 applications – Applications in this category must be up 24 hours a day, 7 days a week and be able to handle continuous transaction load.  Interruption or degradation of service is considered a priority and requires immediate response.
  • High demand business applications – Applications in this category need to be available 24 hours a day during the business week, midnight Monday to midnight Friday.  Interruption of service requires quick response.
  • Standard business applications – Applications in this category should be available during standard business hours, Monday through Friday.  Interruption of service should be limited during business hours.
  • Lower environments – All non-productions environments should be maintained and supported to allow development resources to be able to complete projects.  Downtime should be kept to a reasonable minimum during working hours.

Your tiers will likely be driven by availability needs first.  The key is being able to categorize your application needs so that you can respond appropriately.  This is also to protect you, because if you’re burned out trying to get development boxes taken care of, you won’t be able to handle things if that webserver handling your company’s sales goes down.

Agreeing on something

The key with creating these frameworks is to remember it’s an agreement between you and your users or customers.  You want to work with them to create reasonable expectations so that if something does go haywire, both sides will know how the issue will be handled and what sort of time frames are in place for resolution.

The other piece of this being an agreement is your users or customers will need to meet you part way.  If you have SLAs defined for a highly available 24X7 application, you want your developers to make sure that the application can meet those needs.  It’s key that your users understand their stake in providing good service, such as calling to report errors or ensuring that an application can handle a database failing underneath it.

It’s a start

I’ll be honest, I’m still working on fleshing much of the above out as I roll out an SLA framework at my company.  What you see here will undergo some tweaking and fine tuning over the next few months as I get some more definition around it, but the intent is to follow the basic structure I’ve laid out here.  There will be more specifics under each of the three areas that will be shared as they develop.  If you have anything to contribute or are curious about, please feel free to leave your input or questions in the comments and I will respond as I can.

Database Oil Changes – Part 3

Captain’s Log

The last bit of maintenance is one I think most folks know about, and that is transaction log backups. The reason I consider this maintenance is because its how you manage the overall size and transactional state of your database when it’s not in SIMPLE recovery mode. Without the proper jobs set up, you could be impacting your overall space usage and recovery time on your servers.

SQL Server supports three recovery models:

  • Simple – The most basic recovery model, SQL Server regularly cleans up the transaction log. Transactional history is not maintained, so no point-in-time recovery is available. NOTE: SQL Server still uses the transaction log in the same as in FULL mode.
  • Full – Transactional history is maintained in the log, as SQL Server only performs clean up on the transaction log when a transaction log backup occurs. Since transactional history is maintained, point-in-time recovery is possible.
  • Bulk logged – Similar to FULL, transactional history is retained in the log, but certain actions within the database are minimally logged, meaning that you don’t get the transactional detail, but must recover the entire action that occurred.

What all this means is that in FULL and BULK-LOGGED modes, our transaction logs are going to fill up and grow unless we backup our transactions and signal to SQL Server that it can clean up the log. If you neglect your log backups, you could easily fill up whatever drives you have your log files on. Also, when you have to  bring a database online, SQL Server will have to go through your log files to reconcile the data in your database with all of the transactions.  The larger your log file, the longer this recovery phase will take.

Backing up your log is pretty simple:

BACKUP LOG AdventureWorks TO DISK='X:\Backups\logs.bak'

Another option is to use the Maintenance Plans. Whichever method, you should be making transaction log backups on a regular basis in your databases, dependent (again) on your particular situation. For example, we have a pretty active OLTP system and I have regular log backups scheduled to occur every 15 minutes. Your server might be active enough for only every hour or so. The factors to keep in mind when setting your schedule is:

  • How active is your database? The more transactions you have, the more information will be put into the log, meaning they are apt to grow more with a heavy transaction load.
  • How much time are you willing to lose? You’ll only be able to restore back to the point of your last transaction log backup, so keep that in mind. If your service level agreements (SLAs) allow you to lose up to an hour, than that’s the minimum window you should have.
  • Performance considerations. Remember, this IS a backup job, so your server will take a hit whenever you run it. Not a tremendous one, but if you make your backup jobs to frequent you could be causing unnecessary contention.

Also remember that log backups are dependent on your full backups. Transaction log backups need to be done in conjunction with full database backups to maintain your disaster recovery information.

Engage!

As you’ve probably figured out, our maintenance revolves around managing changes in our databases. This, above all, should be the defining rule of thumb when creating and scheduling maintenance tasks. Again, if we think of it in terms of taking care of a car, you (hopefully) change your oil after a certain amount of use with your car. If your car sits in the garage a lot, you probably won’t have to do much to keep it in shape, but if you’re out on the road every day, you’ll definitely want to let it go.

It’s not at all different with your databases, so when you sit down to set up your jobs and schedules, as yourself the following questions:

  • What’s my environment like? Is it an OLTP system with many, many, many small changes to the data? Is it a data warehouse where changes are few and far between, but significant when they occur?
  • What kind of activity does my server see? Are there periods of “down time” when I can run these kind of jobs uninterrupted or is that time minimal or non-existent?

Just as a database needs proper planning and design when you build out your tables and indexes, the upkeep of a database requires a similar amount of planning. If you do it right, though, you drastically reduce the factors that could impede your server’s performance. Hopefully, this also reduces the number of phone calls you get from users asking why the database is so slow!

Database Oil Changes – Part 2

She’s comin’ apart at the seams!

The other big maintenance task that should be regularly done on your databases is managing your index fragmentation. For those wondering, index fragmentation is how the page splits in your indexes will cause the data pages that make up your indexes to become logically disorganized, so that when the SQL Server engine goes to use the indexes, it will have to make more calls to the physical disk to get the appropriate index pages, thus requiring more work (ultimately meaning your queries will perform slower).

For the full skinny on what index fragmentation is, take an hour and watch Paul Randal’s(b|t) MCM training video.

In order to address index fragmentation, you have a two options:

  • REORGANIZE – This will reorganize the data pages in your index on the leaf level. While it will compact and organize data on that leaf level, header records won’t be adjusted, meaning you might still have index fragmentation.
  • REBUILD – This is exactly what it says it is, a complete rebuild of the index. This is pretty time and resource intensive (especially if your table is large), but once it’s done you will have a clean and shiny new index with little or no fragmentation.

The general rule of thumb on these two options is to REORGANIZE when your fragmentation is low, REBUILD when it gets higher. The numbers typically used are to REORGANIZE when your fragmentation is between 10% and 30%, REBUILD when you exceed 30%, but your mileage may vary.  As with many aspects of SQL Server, there are no real hard and fast rules, so make sure you evaluate your environment when deciding between these two options.

A Level 3 Diagnostic

So how do we find fragmentation? Fortunately, there’s a dynamic management object just for this purpose: sys.dm_db_index_physical_stats.   CAUTION: This is one of the few super-costly DMOs out there, as it will scan all your index pages to find out the exact fragmentation. I recommend you read up on the different options before using it and do not run it on production servers during high activity times.

To get you started with this view, here’s a basic query to get your fragmented indexes in the current database:

SELECT
      db.name AS databaseName
    , ps.OBJECT_ID AS objectID
    , object_name(ps.OBJECT_ID,db.database_id) objectName
    , ps.index_id indexID
    , idx.name
    , ps.partition_number partitionNumber
    , ps.avg_fragmentation_in_percent fragmentation
    , ps.page_count/128.0 index_size
FROM sys.databases db
  JOIN sys.dm_db_index_physical_stats (db_id(),NULL, NULL , NULL, N'Limited') ps
      ON db.database_id = ps.database_id
  left join sys.indexes idx on (ps.object_id = idx.object_id and ps.index_id = idx.index_id)
WHERE ps.index_id > 0 --Skip heap tables
	and ps.avg_fragmentation_in_percent >= 30 --Fragmentation >= 30%
order by fragmentation desc
OPTION (MaxDop 1);

Make it so

So armed with our tools, now we need to set up a regular job to find and defrag our indexes. The frequency with which you do this is completely dependent on your environment, so it might be once a day, once a week, or even once a month. The key, as with the statistics maintenance, is to do it in relation to how frequently your data changes. I try to rebuild my indexes daily (well, only the ones that need it) in my OLTP systems. In a data warehouse, I’ll set up my job to run whenever a major ETL load completes, as that will be just after my data has changed significantly.

And, because the SQL Server community is so awesome, you don’t even have to write your own script! Go take a look at Michelle Ufford’s(b|t) Index Defrag script. It covers many of the possible scenarios for index rebuilding and, with all its parameters, it’s very configurable. You can easily take this stored procedure, implement it into a SQL Agent job, and *BAM*, instant maintenance!

What this will do, as with statistics maintenance, is keep your data in the best shape possible for querying. By reducing the fragmentation in your indexes, you reduce the I/O hit you’ll take whenever you query against those indexes and have one less thing to worry about when troubleshooting performance problems.

One more post to go! We’ll wrap up with me talking(well, writing, really) a little bit about transaction log backups and then your overall general maintenance plan.

Database Oil Changes

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

Your regular checkup

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

  • Update Statisics
  • Defragment/rebuild indexes
  • Perform transaction log backups (sometimes)

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

So let’s start talking about each piece of maintenance.

I love it when a plan comes together

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

You can always check to see if your statistics are current by using the STATS_DATE. The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes view where you want to look at rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size when running a manual update stats. Evaluate your environment to see what works for you.

Updating your statistics is easy enough: use the UPDATE STATISTICS command:

--UPDATE STATS on all the indexes for the customer table
UPDATE STATISTICS customer

--UPDATE STATS on the primary key (PK_CID) of the customer table
UPDATE STATISTICS customer PK_CID

Either of these commands will perform a basic statistics update. SQL Server will recompute your statistics using a calculated sample value, determined by an internal algorithm. You also have the option of declaring your own sample size, as either a row or a percentage, or you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

Donabel Santos (t|b) – Understanding SQL Server Statistics

Elisabeth Redei (b) – Lies, Damned Lies And Statistics (part 1 of 3)

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

Hang tight for a couple days and I’ll be back to talk about even MORE maintenance you should be doing to your servers!

Database Oil Changes

 

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

 

Your regular checkup

 

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

  • Update Statisics

  • Defragment/rebuild indexes

  • Perform transaction log backups (sometimes)

 

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

 

So let’s start talking about each piece of maintenance.

 

I love it when a plan comes together

 

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

 

You can always check to see if your statistics are current by using the STATS_DATE(http://msdn.microsoft.com/en-us/library/ms190330.aspx). The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes(http://msdn.microsoft.com/en-us/library/ms190283.aspx). What you want to look at is rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size. Evaluate your environment to see what works for you.

 

Keeping current

 

To update your statistics is easy enough: use the UPDATE STATISTICS(http://msdn.microsoft.com/en-us/library/ms187348.aspx) command! Done, right?

 

Actually, it’s a a little more detailed than that. You can definitely just use the command without any additional options. When you do this, SQL Server will recompute your statistics using a calculated sample value. You also have the option of declaring your own sample size, as either a row or a percentage. Finally, you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

 

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

 

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

 

 

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

 

 

 

Database Oil Changes

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

Your regular checkup

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

Update Statisics

Defragment/rebuild indexes

Perform transaction log backups (sometimes)

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

So let’s start talking about each piece of maintenance.

I love it when a plan comes together

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

You can always check to see if your statistics are current by using the STATS_DATE(http://msdn.microsoft.com/en-us/library/ms190330.aspx). The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes(http://msdn.microsoft.com/en-us/library/ms190283.aspx). What you want to look at is rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size. Evaluate your environment to see what works for you.

Keeping current

To update your statistics is easy enough: use the UPDATE STATISTICS(http://msdn.microsoft.com/en-us/library/ms187348.aspx) command! Done, right?

Actually, it’s a a little more detailed than that. You can definitely just use the command without any additional options. When you do this, SQL Server will recompute your statistics using a calculated sample value. You also have the option of declaring your own sample size, as either a row or a percentage. Finally, you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

Donabel Santos (twitter (@sqlbelle) | blog) – Understanding SQL Server Statistics (http://sqlserverperformance.idera.com/tsql-optimization/understanding-sql-server-statistics/)

Elisabeth Redei (http://sqlblog.com/blogs/elisabeth_redei/default.aspx) – (http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/lies-damned-lies-and-statistics-part-i.aspx)

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.