Art of the DBA Rotating Header Image

statistics

Database Oil Changes

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

Your regular checkup

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

  • Update Statisics
  • Defragment/rebuild indexes
  • Perform transaction log backups (sometimes)

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

So let’s start talking about each piece of maintenance.

I love it when a plan comes together

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

You can always check to see if your statistics are current by using the STATS_DATE. The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes view where you want to look at rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size when running a manual update stats. Evaluate your environment to see what works for you.

Updating your statistics is easy enough: use the UPDATE STATISTICS command:

--UPDATE STATS on all the indexes for the customer table
UPDATE STATISTICS customer

--UPDATE STATS on the primary key (PK_CID) of the customer table
UPDATE STATISTICS customer PK_CID

Either of these commands will perform a basic statistics update. SQL Server will recompute your statistics using a calculated sample value, determined by an internal algorithm. You also have the option of declaring your own sample size, as either a row or a percentage, or you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

Donabel Santos (t|b) – Understanding SQL Server Statistics

Elisabeth Redei (b) – Lies, Damned Lies And Statistics (part 1 of 3)

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

Hang tight for a couple days and I’ll be back to talk about even MORE maintenance you should be doing to your servers!

Database Oil Changes

 

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

 

Your regular checkup

 

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

  • Update Statisics

  • Defragment/rebuild indexes

  • Perform transaction log backups (sometimes)

 

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

 

So let’s start talking about each piece of maintenance.

 

I love it when a plan comes together

 

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

 

You can always check to see if your statistics are current by using the STATS_DATE(http://msdn.microsoft.com/en-us/library/ms190330.aspx). The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes(http://msdn.microsoft.com/en-us/library/ms190283.aspx). What you want to look at is rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size. Evaluate your environment to see what works for you.

 

Keeping current

 

To update your statistics is easy enough: use the UPDATE STATISTICS(http://msdn.microsoft.com/en-us/library/ms187348.aspx) command! Done, right?

 

Actually, it’s a a little more detailed than that. You can definitely just use the command without any additional options. When you do this, SQL Server will recompute your statistics using a calculated sample value. You also have the option of declaring your own sample size, as either a row or a percentage. Finally, you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

 

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

 

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

 

 

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

 

 

 

Database Oil Changes

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

Your regular checkup

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

Update Statisics

Defragment/rebuild indexes

Perform transaction log backups (sometimes)

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

So let’s start talking about each piece of maintenance.

I love it when a plan comes together

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

You can always check to see if your statistics are current by using the STATS_DATE(http://msdn.microsoft.com/en-us/library/ms190330.aspx). The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes(http://msdn.microsoft.com/en-us/library/ms190283.aspx). What you want to look at is rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size. Evaluate your environment to see what works for you.

Keeping current

To update your statistics is easy enough: use the UPDATE STATISTICS(http://msdn.microsoft.com/en-us/library/ms187348.aspx) command! Done, right?

Actually, it’s a a little more detailed than that. You can definitely just use the command without any additional options. When you do this, SQL Server will recompute your statistics using a calculated sample value. You also have the option of declaring your own sample size, as either a row or a percentage. Finally, you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

Donabel Santos (twitter (@sqlbelle) | blog) – Understanding SQL Server Statistics (http://sqlserverperformance.idera.com/tsql-optimization/understanding-sql-server-statistics/)

Elisabeth Redei (http://sqlblog.com/blogs/elisabeth_redei/default.aspx) – (http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/lies-damned-lies-and-statistics-part-i.aspx)

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.