Art of the DBA Rotating Header Image

May, 2011:

Database Oil Changes

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

Your regular checkup

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

  • Update Statisics
  • Defragment/rebuild indexes
  • Perform transaction log backups (sometimes)

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

So let’s start talking about each piece of maintenance.

I love it when a plan comes together

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

You can always check to see if your statistics are current by using the STATS_DATE. The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes view where you want to look at rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size when running a manual update stats. Evaluate your environment to see what works for you.

Updating your statistics is easy enough: use the UPDATE STATISTICS command:

--UPDATE STATS on all the indexes for the customer table
UPDATE STATISTICS customer

--UPDATE STATS on the primary key (PK_CID) of the customer table
UPDATE STATISTICS customer PK_CID

Either of these commands will perform a basic statistics update. SQL Server will recompute your statistics using a calculated sample value, determined by an internal algorithm. You also have the option of declaring your own sample size, as either a row or a percentage, or you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

Donabel Santos (t|b) – Understanding SQL Server Statistics

Elisabeth Redei (b) – Lies, Damned Lies And Statistics (part 1 of 3)

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

Hang tight for a couple days and I’ll be back to talk about even MORE maintenance you should be doing to your servers!

Database Oil Changes

 

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

 

Your regular checkup

 

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

  • Update Statisics

  • Defragment/rebuild indexes

  • Perform transaction log backups (sometimes)

 

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

 

So let’s start talking about each piece of maintenance.

 

I love it when a plan comes together

 

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

 

You can always check to see if your statistics are current by using the STATS_DATE(http://msdn.microsoft.com/en-us/library/ms190330.aspx). The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes(http://msdn.microsoft.com/en-us/library/ms190283.aspx). What you want to look at is rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size. Evaluate your environment to see what works for you.

 

Keeping current

 

To update your statistics is easy enough: use the UPDATE STATISTICS(http://msdn.microsoft.com/en-us/library/ms187348.aspx) command! Done, right?

 

Actually, it’s a a little more detailed than that. You can definitely just use the command without any additional options. When you do this, SQL Server will recompute your statistics using a calculated sample value. You also have the option of declaring your own sample size, as either a row or a percentage. Finally, you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

 

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

 

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

 

 

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

 

 

 

Database Oil Changes

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

Your regular checkup

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

Update Statisics

Defragment/rebuild indexes

Perform transaction log backups (sometimes)

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

So let’s start talking about each piece of maintenance.

I love it when a plan comes together

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

You can always check to see if your statistics are current by using the STATS_DATE(http://msdn.microsoft.com/en-us/library/ms190330.aspx). The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes(http://msdn.microsoft.com/en-us/library/ms190283.aspx). What you want to look at is rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size. Evaluate your environment to see what works for you.

Keeping current

To update your statistics is easy enough: use the UPDATE STATISTICS(http://msdn.microsoft.com/en-us/library/ms187348.aspx) command! Done, right?

Actually, it’s a a little more detailed than that. You can definitely just use the command without any additional options. When you do this, SQL Server will recompute your statistics using a calculated sample value. You also have the option of declaring your own sample size, as either a row or a percentage. Finally, you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

Donabel Santos (twitter (@sqlbelle) | blog) – Understanding SQL Server Statistics (http://sqlserverperformance.idera.com/tsql-optimization/understanding-sql-server-statistics/)

Elisabeth Redei (http://sqlblog.com/blogs/elisabeth_redei/default.aspx) – (http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/lies-damned-lies-and-statistics-part-i.aspx)

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

Standing on the Shoulders of Giants

When I went to SQL Rally last week, I fully expected to be learning from many of the presenters there. The excitement about being immersed in all that awesomesauce is infectious. What I didn’t expect is that I would take my turn at teaching, handing out information like candy to twelve year olds. It was one of those times where I was talking shop with some other folks while we were all taking a session off. The group started chatting about performance tuning(something I geek out about) and I shared out some of the fundamental links, blogs, and tools I use when I get the inevitable “My query is slow!” call.

Now, just to be clear, none of the information I shared was “mine”. The beauty of the SQL Server community is that we have so much great talent and everyone is willing to help educate. What follows here is a list of my core tools and information links, resources that I use on almost a daily basis. I also see them as the start of a journey, because none of this is the last word on a particular subject. If you’re like me you’ll continue to browse the Google ‘verse with what you pick up here.

Brent Ozar:

Full disclosure, I’m a Brent Ozar fanboy. I started reading his blog about a year or so ago and he’s made me laugh and taught me a whole heck of a lot. It’s because of him that I started blogging and getting involved in the community.

Glenn Berry:

Glenn Berry is a local Denver MVP and I see him from time to time at the Denver SQL User’s Group meetings. Not only has he provided these great DMV queries, but he really understands hardware and has some great posts on the nitty-gritty of your machines.

General Community Resources

These two sites are invaluable resources for articles and tutorials on getting stuff done. When I was figuring out how to implement partitioning, I did a lot of reading on both of these sites.

Documents

Both of these docs live on my laptop and are constantly getting referenced. My only complaint about the waits and queues white paper is that it’s a little stale and I hope Microsoft does an updated one for Denali when it’s released.

If you don’t have a cheat sheet with your go to resources, hopefully this list gives you a good start. Or you can add this to your currently existing list. I have found each one of these links invaluable to my day to day work, not only enriching my general knowledge but giving me the tools to troubleshoot many of the performance problems that come my way.

SQL Rally Orlando 2011

Or, how I got my brain to full tank.

I’m still pretty new to the Professional Association for SQL Server(PASS) and the SQL Server community. It’s been great, attending my local user groups along with a SQL Saturday, but I kept hearing about how great Summit was. When I heard about SQL Rally in Orlando, advertised as a “mini-Summit” that is only 3 days long (with the pre-convention talks) and at a much lower cost, I figured I had a great opportunity to get in on the next level. What followed was one of the best 4 days of my career, where I was able to learn and connect with my SQL Server peers across the country.

What it is!

So SQL Rally was created to fill the gap of training and networking you have between SQL Saturday events and the 5 day PASS Summit. With SQL Saturday, you’d get a dozen or so sessions and a chance to connect with your local peers, something of a shotglass taste of the SQL community. The summit, on the other hand, is where you have over a hundred sessions to chose from and thousands of people to meet from around the world. It’s like getting the firehose turned on and you just hope to get what you can.

This SQL Rally struck a great balance in between the two events. There were close to 50 sessions over two days, along with 4 pre-convention sessions that gave you 7 hour deep dives into a couple different topics. I think the final attendee number came in around 450, which meant that there were plenty of new people to meet, but you knew you’d see them again throughout the week. In fact, by the end of Friday, I would be checking in with one of my new friends every couple of minutes in the halls. Lunch was particularly dangerous, as I would get wrapped up in conversation on my way to getting food or a drink, or talking shop with someone while my sandwich sat untouched in front of me.

In general, it was awesome! :D

It’s time for your session, Mr. Fal

The main function of SQL Rally was to provide training with a myriad of sessions across the Business Intelligence, Database Development, Database Administration, and Professional Development tracks. Won’t get to deep into it, but here were the high points for me:
Gold medal winners:

  • Grant Fritchey’s(b|t) session on Query Tuning was solid gold. I learned an incredible amount about reading and fixing query plans. I also learned that he hates query hints as much as I do! But beyond that, I loved Grant’s enthusiasm (seriously, the man was jumping around about query plans) and his technical knowledge runs deep. To boot, I learned at least 4 things that I can take back to work on Monday and apply.
  • Wes Brown(b|t) is a gem of a man and his session on storage was fan-bloody-tastic. I don’t think I’ve ever learned and laughed so much within the space of an hour and a half. He really dug in to the internals of storage hardware and, while I’ve always had cursory knowledge of some of the info, he really filled in the gaps. Best of all, though, he did it in a way where you weren’t overwhelmed or intimidated. I can’t stress enough that if you get the chance to hear Wes speak, grab hold with both hands and don’t let go.

Silver medal recipients:

  • John Sterrett(b|t) had a great session on Policy Based Management and Central Management Server. While it was fairly technical and had a lot of good information, it did jump around a bit. Everything was worthwhile, but you did have to keep your head on a swivel and John drilled through all the different aspects of PBM/CSM with live demos. Which, by the way, makes him a far braver man than I, since many of the demos were ad hoc responses to questions instead of fully prepared examples. To his credit, they didn’t fail.
  • Jen McCown(b|t) had a great presentation on T-SQL Sins. Most of the material is stuff we all know, like documenting your code, enforcing constraints, and the like. However, half the session was a wonderfully cathartic period where members of the audience shared coding horror stories. We all have the same struggles, but as a community we can work to share the burden of promoting good practices while commiserating against the hurdles or ignorance we face.

Connection=SQL Community; Provider=SQLRALLY

For me, the biggest benefit of this event was the networking. I mean, I know local guys, but I’m a firm believer in that you’re only as strong as your network. We can’t know everything in this business, so if I have friends I can call on for help then I’m that much stronger as a professional.

Plus, ya’ gotta have friends to drink with at SQL Karaoke!

The social events were all pretty good, but essentially boiled down to a SQL happy hour each night. Now, me being a weird style of social butterfly, I would always make sure I wandered around the room to chat with people I had never met before. The result was amazing. I love how the people of the SQL Server community are so open and friendly. I’ve seen it online, with the number of people willing to share scripts and information to help each other out, but in person it’s even more evident as people are just willing to have a good time. It goes back to the commiseration element, where we all know that we share the same struggles and problems day to day.

Some shout outs to my some of my new community friends:

  • Stacia Misner(b|t) tried her best to convert me to a BI Developer during the Rally. Sorry to say to her, but I just don’t get passionate about cubes and MDX queries. Still, Stacia was open and friendly when I walked up at one of the events on Tuesday night and introduced myself. This lead to a lot of fun over the next few days, even though I missed her session (I beg forgiveness!!!).
  • Eric Wisdahl(b|t) ended up being my room mate in one of those “Well, I’ve never met you before, but I don’t think you’ll stab me to death in my sleep” situations. He’s whip-smart and very relaxed (a hoopy frood, you might say), and thanks to him I now know the glory of Chik-fil-a.
  • Jen McCown gifted me with awesome conversation about Wil Wheaton, Battlestar Galactica, and a new hat. I regret that I’ve never caught her DBAs at Midnight podcast, but I will definitely add that to my weekly To-Do list.
  • Garreth Swanepoel(b|t) was just plain awesome. He was volunteering as well as attending, and was always around with a smile and a laugh. I mean, you can’t but help enjoying yourself when Garreth’s in the room. To bad he’s moving in to BI development. (I keed! I keed!)

There were so many more folks I met, so this is by no means an exhaustive list. Everyone I talked with at SQL Rally made me a better professional, in one way or another. The best part of the socialization was that I never felt like there was a “clique” or any sort of elitism. Everyone, from the big names to the folks who were at their first national event (like me!) was open, friendly, and helpful.

That’s a wrap!

SQL Rally was awesome. There’s really no way to put it. I had a great time learning and meeting folks, realizing that there are so many more people like me in this business. The SQL Community is amazingly giving and nothing showed it to me more than this event. If you haven’t gone to anything before, start making your plans for this. The next SQL Rally is in Sweden in November, so that might be a little tough, but I know that PASS is working on hammering out the next US Rally location as we speak. Keep your eyes peeled.

To wrap up, I want to thank Kendal van Dyke(b|t), Andy Warren(b|t), Jack Corbett(b|t), and all the other SQL Rally Organizers. Thanks to the blood, sweat, and tears of these fine people, SQL Rally was a smashing success and got at least one person fired up to do more with and for the SQL Community. Thanks guys and see you all soon!

Meme Monday – No, it’s not really the disk

Tom LaRock asked for 9 of your biggest non-disk problems for Meme Monday, so here we go:

1) Index Fragmentation
2) Memory over-consumption(stop running your apps on the SQL Server!!!!)
3) Run away queries
4) Network communication
5) ODBC/OLEDB drivers (Oracle, I’m looking at you)
6) Over-indexing (yeah, that table with 18 columns and 12 indexes….)
7) Lack of partitioning
8) Stale statistics
9) Locking (did you really have to use that exclusive table lock hint?????)

5 of these I’ve had to resolve in the last week. All of them within the last month. The good news is that most of these are pretty easy to identify and solve, but the bad news is most of these issues are the result of bad coding/querying practices and can be prevented with a little forethought. Which means I have more work to do with my developers…

Thanks for joining me on Meme Monday!

————————————

Edit: After a brief discussion with Jeff Smith(b|t), I thought I should clarify the “Lack of Partitioning”.  I know most people use this to spread the I/O load against a data file, but I found it as a HUGE boost for performance because it allowed me to segment out my table into manageable parts for maintaining indexes and improving select performance.  Take a look at my partitioning post for more info.