Art of the DBA Rotating Header Image

SQL Server

Meme Monday – Dumb Questions

Here I am, jumping in again for this month’s Meme Monday.  For June, Tom LaRock(t|b) is asking for our dumb SQL questions.

Can you disable the transaction log?

The answer is, emphatically, “NO”.  Most people I’ve run into ask this because they struggle with delays caused by locking and deadlocks, or they’re looking for ways to speed up inserts.  The facts are that without the transaction log and everything that comes with it, you lose the ACIDity (Atomicity, Consistency, Isolation, Durability) within the database.  In other words, you lose all the elements that control and maintain the integrity of your data while thousands of different statements are trying to get at it.

So the next time someone comes to you asking about just disabling this key feature, send ’em to this great article by Steve Jones.

Database Oil Changes – Part 3

Captain’s Log

The last bit of maintenance is one I think most folks know about, and that is transaction log backups. The reason I consider this maintenance is because its how you manage the overall size and transactional state of your database when it’s not in SIMPLE recovery mode. Without the proper jobs set up, you could be impacting your overall space usage and recovery time on your servers.

SQL Server supports three recovery models:

  • Simple – The most basic recovery model, SQL Server regularly cleans up the transaction log. Transactional history is not maintained, so no point-in-time recovery is available. NOTE: SQL Server still uses the transaction log in the same as in FULL mode.
  • Full – Transactional history is maintained in the log, as SQL Server only performs clean up on the transaction log when a transaction log backup occurs. Since transactional history is maintained, point-in-time recovery is possible.
  • Bulk logged – Similar to FULL, transactional history is retained in the log, but certain actions within the database are minimally logged, meaning that you don’t get the transactional detail, but must recover the entire action that occurred.

What all this means is that in FULL and BULK-LOGGED modes, our transaction logs are going to fill up and grow unless we backup our transactions and signal to SQL Server that it can clean up the log. If you neglect your log backups, you could easily fill up whatever drives you have your log files on. Also, when you have to  bring a database online, SQL Server will have to go through your log files to reconcile the data in your database with all of the transactions.  The larger your log file, the longer this recovery phase will take.

Backing up your log is pretty simple:

BACKUP LOG AdventureWorks TO DISK='X:\Backups\logs.bak'

Another option is to use the Maintenance Plans. Whichever method, you should be making transaction log backups on a regular basis in your databases, dependent (again) on your particular situation. For example, we have a pretty active OLTP system and I have regular log backups scheduled to occur every 15 minutes. Your server might be active enough for only every hour or so. The factors to keep in mind when setting your schedule is:

  • How active is your database? The more transactions you have, the more information will be put into the log, meaning they are apt to grow more with a heavy transaction load.
  • How much time are you willing to lose? You’ll only be able to restore back to the point of your last transaction log backup, so keep that in mind. If your service level agreements (SLAs) allow you to lose up to an hour, than that’s the minimum window you should have.
  • Performance considerations. Remember, this IS a backup job, so your server will take a hit whenever you run it. Not a tremendous one, but if you make your backup jobs to frequent you could be causing unnecessary contention.

Also remember that log backups are dependent on your full backups. Transaction log backups need to be done in conjunction with full database backups to maintain your disaster recovery information.

Engage!

As you’ve probably figured out, our maintenance revolves around managing changes in our databases. This, above all, should be the defining rule of thumb when creating and scheduling maintenance tasks. Again, if we think of it in terms of taking care of a car, you (hopefully) change your oil after a certain amount of use with your car. If your car sits in the garage a lot, you probably won’t have to do much to keep it in shape, but if you’re out on the road every day, you’ll definitely want to let it go.

It’s not at all different with your databases, so when you sit down to set up your jobs and schedules, as yourself the following questions:

  • What’s my environment like? Is it an OLTP system with many, many, many small changes to the data? Is it a data warehouse where changes are few and far between, but significant when they occur?
  • What kind of activity does my server see? Are there periods of “down time” when I can run these kind of jobs uninterrupted or is that time minimal or non-existent?

Just as a database needs proper planning and design when you build out your tables and indexes, the upkeep of a database requires a similar amount of planning. If you do it right, though, you drastically reduce the factors that could impede your server’s performance. Hopefully, this also reduces the number of phone calls you get from users asking why the database is so slow!

Database Oil Changes – Part 2

She’s comin’ apart at the seams!

The other big maintenance task that should be regularly done on your databases is managing your index fragmentation. For those wondering, index fragmentation is how the page splits in your indexes will cause the data pages that make up your indexes to become logically disorganized, so that when the SQL Server engine goes to use the indexes, it will have to make more calls to the physical disk to get the appropriate index pages, thus requiring more work (ultimately meaning your queries will perform slower).

For the full skinny on what index fragmentation is, take an hour and watch Paul Randal’s(b|t) MCM training video.

In order to address index fragmentation, you have a two options:

  • REORGANIZE – This will reorganize the data pages in your index on the leaf level. While it will compact and organize data on that leaf level, header records won’t be adjusted, meaning you might still have index fragmentation.
  • REBUILD – This is exactly what it says it is, a complete rebuild of the index. This is pretty time and resource intensive (especially if your table is large), but once it’s done you will have a clean and shiny new index with little or no fragmentation.

The general rule of thumb on these two options is to REORGANIZE when your fragmentation is low, REBUILD when it gets higher. The numbers typically used are to REORGANIZE when your fragmentation is between 10% and 30%, REBUILD when you exceed 30%, but your mileage may vary.  As with many aspects of SQL Server, there are no real hard and fast rules, so make sure you evaluate your environment when deciding between these two options.

A Level 3 Diagnostic

So how do we find fragmentation? Fortunately, there’s a dynamic management object just for this purpose: sys.dm_db_index_physical_stats.   CAUTION: This is one of the few super-costly DMOs out there, as it will scan all your index pages to find out the exact fragmentation. I recommend you read up on the different options before using it and do not run it on production servers during high activity times.

To get you started with this view, here’s a basic query to get your fragmented indexes in the current database:

SELECT
      db.name AS databaseName
    , ps.OBJECT_ID AS objectID
    , object_name(ps.OBJECT_ID,db.database_id) objectName
    , ps.index_id indexID
    , idx.name
    , ps.partition_number partitionNumber
    , ps.avg_fragmentation_in_percent fragmentation
    , ps.page_count/128.0 index_size
FROM sys.databases db
  JOIN sys.dm_db_index_physical_stats (db_id(),NULL, NULL , NULL, N'Limited') ps
      ON db.database_id = ps.database_id
  left join sys.indexes idx on (ps.object_id = idx.object_id and ps.index_id = idx.index_id)
WHERE ps.index_id > 0 --Skip heap tables
	and ps.avg_fragmentation_in_percent >= 30 --Fragmentation >= 30%
order by fragmentation desc
OPTION (MaxDop 1);

Make it so

So armed with our tools, now we need to set up a regular job to find and defrag our indexes. The frequency with which you do this is completely dependent on your environment, so it might be once a day, once a week, or even once a month. The key, as with the statistics maintenance, is to do it in relation to how frequently your data changes. I try to rebuild my indexes daily (well, only the ones that need it) in my OLTP systems. In a data warehouse, I’ll set up my job to run whenever a major ETL load completes, as that will be just after my data has changed significantly.

And, because the SQL Server community is so awesome, you don’t even have to write your own script! Go take a look at Michelle Ufford’s(b|t) Index Defrag script. It covers many of the possible scenarios for index rebuilding and, with all its parameters, it’s very configurable. You can easily take this stored procedure, implement it into a SQL Agent job, and *BAM*, instant maintenance!

What this will do, as with statistics maintenance, is keep your data in the best shape possible for querying. By reducing the fragmentation in your indexes, you reduce the I/O hit you’ll take whenever you query against those indexes and have one less thing to worry about when troubleshooting performance problems.

One more post to go! We’ll wrap up with me talking(well, writing, really) a little bit about transaction log backups and then your overall general maintenance plan.

Database Oil Changes

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

Your regular checkup

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

  • Update Statisics
  • Defragment/rebuild indexes
  • Perform transaction log backups (sometimes)

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

So let’s start talking about each piece of maintenance.

I love it when a plan comes together

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

You can always check to see if your statistics are current by using the STATS_DATE. The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes view where you want to look at rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size when running a manual update stats. Evaluate your environment to see what works for you.

Updating your statistics is easy enough: use the UPDATE STATISTICS command:

--UPDATE STATS on all the indexes for the customer table
UPDATE STATISTICS customer

--UPDATE STATS on the primary key (PK_CID) of the customer table
UPDATE STATISTICS customer PK_CID

Either of these commands will perform a basic statistics update. SQL Server will recompute your statistics using a calculated sample value, determined by an internal algorithm. You also have the option of declaring your own sample size, as either a row or a percentage, or you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

Donabel Santos (t|b) – Understanding SQL Server Statistics

Elisabeth Redei (b) – Lies, Damned Lies And Statistics (part 1 of 3)

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

Hang tight for a couple days and I’ll be back to talk about even MORE maintenance you should be doing to your servers!

Database Oil Changes

 

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

 

Your regular checkup

 

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

  • Update Statisics

  • Defragment/rebuild indexes

  • Perform transaction log backups (sometimes)

 

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

 

So let’s start talking about each piece of maintenance.

 

I love it when a plan comes together

 

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

 

You can always check to see if your statistics are current by using the STATS_DATE(http://msdn.microsoft.com/en-us/library/ms190330.aspx). The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes(http://msdn.microsoft.com/en-us/library/ms190283.aspx). What you want to look at is rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size. Evaluate your environment to see what works for you.

 

Keeping current

 

To update your statistics is easy enough: use the UPDATE STATISTICS(http://msdn.microsoft.com/en-us/library/ms187348.aspx) command! Done, right?

 

Actually, it’s a a little more detailed than that. You can definitely just use the command without any additional options. When you do this, SQL Server will recompute your statistics using a calculated sample value. You also have the option of declaring your own sample size, as either a row or a percentage. Finally, you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

 

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

 

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

 

 

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

 

 

 

Database Oil Changes

Everyone knows you need to change the oil in your car. Also, we know that we need to change the filter in our furnace. Maintenance of our homes and many of the things in our lives is a matter of course. But do you think of it with your databases? More and more often, I’ve run in to people who think that databases “just take care of themselves” and don’t understand why their performance degrades over time. Yes, Virginia, databases need to be maintained too.

Your regular checkup

Over the next few posts, I’ll dig into the two maintenance tasks I perform regularly on my databases, as well as cover a third that I do sometimes depending on the database. What are these tasks, you ask? Glad you asked!

Update Statisics

Defragment/rebuild indexes

Perform transaction log backups (sometimes)

While your databases will continue to function if you neglect any one of these, it’s a lot like skipping cleanings with your dentist. Slowly your performance will degrade and queries will take longer and longer to complete. It’s important that you setup a regular schedule for these, though what that schedule will be is up to your specific environment.

So let’s start talking about each piece of maintenance.

I love it when a plan comes together

Statistics is probably one of the easiest and most overlooked item for maintenance. They contain information for each index on data values and are used by the query optimizer whenever a query plan is generated. The problem is people either completely neglect statistics or assume that since they have autoupdate and autocreate statistics turned on, the server just takes care of itself. If you have a frequently changing dataset or you let enough time go by, you can see statistics get stale even with these options enabled.

You can always check to see if your statistics are current by using the STATS_DATE(http://msdn.microsoft.com/en-us/library/ms190330.aspx). The tricky part to remember is that even though a date may be old, your statistics may be fine. If you’re table hasn’t changed much, then everything might be just fine. This is where you check the next piece of data, the sys.sysindexes(http://msdn.microsoft.com/en-us/library/ms190283.aspx). What you want to look at is rowcnt (the count of rows for the index) versus rowmodctr (the count of modified rows since the last time statistics were updated). Compare the two to see if the change is enough to warrant an update. The autoupdate routine uses 500+20% of the row count as a baseline, but I prefer to use 15% of the total table size. Evaluate your environment to see what works for you.

Keeping current

To update your statistics is easy enough: use the UPDATE STATISTICS(http://msdn.microsoft.com/en-us/library/ms187348.aspx) command! Done, right?

Actually, it’s a a little more detailed than that. You can definitely just use the command without any additional options. When you do this, SQL Server will recompute your statistics using a calculated sample value. You also have the option of declaring your own sample size, as either a row or a percentage. Finally, you can simply declare FULLSCAN and have the engine scan the entire table to build the statistics. The FULLSCAN option will provide the most accurate statistics, but will take the most time as the engine will perform a full table scan. Which option you use comes down to how much time you have to perform your statistics update and how big your tables are.

Now, how often you mange your statistics is really dependent on your situation. It isn’t a huge performance hit, but can take a fair amount of time. Personally, I try to get away with it as much as I can, aiming for once a day on frequently changing data. Also, you want to try and set it up to run after you’ve had major changes to your data. So, for example, if I were setting this up on a data warehouse with a daily ETL load, I would build this in to run after the ETL load finished. If my ETL load was once a week, I would set it up then to run once a week, after the ETL load completes. If it’s like most data warehouses, there isn’t much point to running it more than that as the data isn’t changing that often.

Want to know more?

Obviously I’ve only scratched the surface here. To really understand statistics and what’s behind them, here’s some links to follow up with:

Donabel Santos (twitter (@sqlbelle) | blog) – Understanding SQL Server Statistics (http://sqlserverperformance.idera.com/tsql-optimization/understanding-sql-server-statistics/)

Elisabeth Redei (http://sqlblog.com/blogs/elisabeth_redei/default.aspx) – (http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/lies-damned-lies-and-statistics-part-i.aspx)

Both of these blog articles are chock full of the ins and outs of stats. What I want to stress to you with THIS article is that you want to make this an automated, regular part of your server maintenance. Your users may not thank you (if all goes well, the database will just truck along without them complaining), but you should sleep better having one less thing to worry about.

Standing on the Shoulders of Giants

When I went to SQL Rally last week, I fully expected to be learning from many of the presenters there. The excitement about being immersed in all that awesomesauce is infectious. What I didn’t expect is that I would take my turn at teaching, handing out information like candy to twelve year olds. It was one of those times where I was talking shop with some other folks while we were all taking a session off. The group started chatting about performance tuning(something I geek out about) and I shared out some of the fundamental links, blogs, and tools I use when I get the inevitable “My query is slow!” call.

Now, just to be clear, none of the information I shared was “mine”. The beauty of the SQL Server community is that we have so much great talent and everyone is willing to help educate. What follows here is a list of my core tools and information links, resources that I use on almost a daily basis. I also see them as the start of a journey, because none of this is the last word on a particular subject. If you’re like me you’ll continue to browse the Google ‘verse with what you pick up here.

Brent Ozar:

Full disclosure, I’m a Brent Ozar fanboy. I started reading his blog about a year or so ago and he’s made me laugh and taught me a whole heck of a lot. It’s because of him that I started blogging and getting involved in the community.

Glenn Berry:

Glenn Berry is a local Denver MVP and I see him from time to time at the Denver SQL User’s Group meetings. Not only has he provided these great DMV queries, but he really understands hardware and has some great posts on the nitty-gritty of your machines.

General Community Resources

These two sites are invaluable resources for articles and tutorials on getting stuff done. When I was figuring out how to implement partitioning, I did a lot of reading on both of these sites.

Documents

Both of these docs live on my laptop and are constantly getting referenced. My only complaint about the waits and queues white paper is that it’s a little stale and I hope Microsoft does an updated one for Denali when it’s released.

If you don’t have a cheat sheet with your go to resources, hopefully this list gives you a good start. Or you can add this to your currently existing list. I have found each one of these links invaluable to my day to day work, not only enriching my general knowledge but giving me the tools to troubleshoot many of the performance problems that come my way.

SQL Rally Orlando 2011

Or, how I got my brain to full tank.

I’m still pretty new to the Professional Association for SQL Server(PASS) and the SQL Server community. It’s been great, attending my local user groups along with a SQL Saturday, but I kept hearing about how great Summit was. When I heard about SQL Rally in Orlando, advertised as a “mini-Summit” that is only 3 days long (with the pre-convention talks) and at a much lower cost, I figured I had a great opportunity to get in on the next level. What followed was one of the best 4 days of my career, where I was able to learn and connect with my SQL Server peers across the country.

What it is!

So SQL Rally was created to fill the gap of training and networking you have between SQL Saturday events and the 5 day PASS Summit. With SQL Saturday, you’d get a dozen or so sessions and a chance to connect with your local peers, something of a shotglass taste of the SQL community. The summit, on the other hand, is where you have over a hundred sessions to chose from and thousands of people to meet from around the world. It’s like getting the firehose turned on and you just hope to get what you can.

This SQL Rally struck a great balance in between the two events. There were close to 50 sessions over two days, along with 4 pre-convention sessions that gave you 7 hour deep dives into a couple different topics. I think the final attendee number came in around 450, which meant that there were plenty of new people to meet, but you knew you’d see them again throughout the week. In fact, by the end of Friday, I would be checking in with one of my new friends every couple of minutes in the halls. Lunch was particularly dangerous, as I would get wrapped up in conversation on my way to getting food or a drink, or talking shop with someone while my sandwich sat untouched in front of me.

In general, it was awesome! :D

It’s time for your session, Mr. Fal

The main function of SQL Rally was to provide training with a myriad of sessions across the Business Intelligence, Database Development, Database Administration, and Professional Development tracks. Won’t get to deep into it, but here were the high points for me:
Gold medal winners:

  • Grant Fritchey’s(b|t) session on Query Tuning was solid gold. I learned an incredible amount about reading and fixing query plans. I also learned that he hates query hints as much as I do! But beyond that, I loved Grant’s enthusiasm (seriously, the man was jumping around about query plans) and his technical knowledge runs deep. To boot, I learned at least 4 things that I can take back to work on Monday and apply.
  • Wes Brown(b|t) is a gem of a man and his session on storage was fan-bloody-tastic. I don’t think I’ve ever learned and laughed so much within the space of an hour and a half. He really dug in to the internals of storage hardware and, while I’ve always had cursory knowledge of some of the info, he really filled in the gaps. Best of all, though, he did it in a way where you weren’t overwhelmed or intimidated. I can’t stress enough that if you get the chance to hear Wes speak, grab hold with both hands and don’t let go.

Silver medal recipients:

  • John Sterrett(b|t) had a great session on Policy Based Management and Central Management Server. While it was fairly technical and had a lot of good information, it did jump around a bit. Everything was worthwhile, but you did have to keep your head on a swivel and John drilled through all the different aspects of PBM/CSM with live demos. Which, by the way, makes him a far braver man than I, since many of the demos were ad hoc responses to questions instead of fully prepared examples. To his credit, they didn’t fail.
  • Jen McCown(b|t) had a great presentation on T-SQL Sins. Most of the material is stuff we all know, like documenting your code, enforcing constraints, and the like. However, half the session was a wonderfully cathartic period where members of the audience shared coding horror stories. We all have the same struggles, but as a community we can work to share the burden of promoting good practices while commiserating against the hurdles or ignorance we face.

Connection=SQL Community; Provider=SQLRALLY

For me, the biggest benefit of this event was the networking. I mean, I know local guys, but I’m a firm believer in that you’re only as strong as your network. We can’t know everything in this business, so if I have friends I can call on for help then I’m that much stronger as a professional.

Plus, ya’ gotta have friends to drink with at SQL Karaoke!

The social events were all pretty good, but essentially boiled down to a SQL happy hour each night. Now, me being a weird style of social butterfly, I would always make sure I wandered around the room to chat with people I had never met before. The result was amazing. I love how the people of the SQL Server community are so open and friendly. I’ve seen it online, with the number of people willing to share scripts and information to help each other out, but in person it’s even more evident as people are just willing to have a good time. It goes back to the commiseration element, where we all know that we share the same struggles and problems day to day.

Some shout outs to my some of my new community friends:

  • Stacia Misner(b|t) tried her best to convert me to a BI Developer during the Rally. Sorry to say to her, but I just don’t get passionate about cubes and MDX queries. Still, Stacia was open and friendly when I walked up at one of the events on Tuesday night and introduced myself. This lead to a lot of fun over the next few days, even though I missed her session (I beg forgiveness!!!).
  • Eric Wisdahl(b|t) ended up being my room mate in one of those “Well, I’ve never met you before, but I don’t think you’ll stab me to death in my sleep” situations. He’s whip-smart and very relaxed (a hoopy frood, you might say), and thanks to him I now know the glory of Chik-fil-a.
  • Jen McCown gifted me with awesome conversation about Wil Wheaton, Battlestar Galactica, and a new hat. I regret that I’ve never caught her DBAs at Midnight podcast, but I will definitely add that to my weekly To-Do list.
  • Garreth Swanepoel(b|t) was just plain awesome. He was volunteering as well as attending, and was always around with a smile and a laugh. I mean, you can’t but help enjoying yourself when Garreth’s in the room. To bad he’s moving in to BI development. (I keed! I keed!)

There were so many more folks I met, so this is by no means an exhaustive list. Everyone I talked with at SQL Rally made me a better professional, in one way or another. The best part of the socialization was that I never felt like there was a “clique” or any sort of elitism. Everyone, from the big names to the folks who were at their first national event (like me!) was open, friendly, and helpful.

That’s a wrap!

SQL Rally was awesome. There’s really no way to put it. I had a great time learning and meeting folks, realizing that there are so many more people like me in this business. The SQL Community is amazingly giving and nothing showed it to me more than this event. If you haven’t gone to anything before, start making your plans for this. The next SQL Rally is in Sweden in November, so that might be a little tough, but I know that PASS is working on hammering out the next US Rally location as we speak. Keep your eyes peeled.

To wrap up, I want to thank Kendal van Dyke(b|t), Andy Warren(b|t), Jack Corbett(b|t), and all the other SQL Rally Organizers. Thanks to the blood, sweat, and tears of these fine people, SQL Rally was a smashing success and got at least one person fired up to do more with and for the SQL Community. Thanks guys and see you all soon!

Meme Monday – No, it’s not really the disk

Tom LaRock asked for 9 of your biggest non-disk problems for Meme Monday, so here we go:

1) Index Fragmentation
2) Memory over-consumption(stop running your apps on the SQL Server!!!!)
3) Run away queries
4) Network communication
5) ODBC/OLEDB drivers (Oracle, I’m looking at you)
6) Over-indexing (yeah, that table with 18 columns and 12 indexes….)
7) Lack of partitioning
8) Stale statistics
9) Locking (did you really have to use that exclusive table lock hint?????)

5 of these I’ve had to resolve in the last week. All of them within the last month. The good news is that most of these are pretty easy to identify and solve, but the bad news is most of these issues are the result of bad coding/querying practices and can be prevented with a little forethought. Which means I have more work to do with my developers…

Thanks for joining me on Meme Monday!

————————————

Edit: After a brief discussion with Jeff Smith(b|t), I thought I should clarify the “Lack of Partitioning”.  I know most people use this to spread the I/O load against a data file, but I found it as a HUGE boost for performance because it allowed me to segment out my table into manageable parts for maintaining indexes and improving select performance.  Take a look at my partitioning post for more info.

Monday Scripts – Getting Partition Information

I’m going to follow up my post on partitioning with a query I wrote while implementing the partitions and cleaning up the empty ones. One of the issues I ran in to was clearly seeing each partition’s range and the number of rows within it. These two questions are answered in different places in the catalog views, sys.partition_range_values and sys.partitions. To build this out, I started with a nice little query from Derek Dieter, adding in a couple joins to get me to sys.partitions. Here’s the result!

select 
	t.name as TableName
	, ps.name as PartitionScheme
	, pf.name as PartitionFunction
	, p.partition_number
	, p.rows
	, case 
		when pf.boundary_value_on_right=1 then 'RIGHT' 
		else 'LEFT' 
	  end [range_type]
	, prv.value [boundary]
from sys.tables t
    join sys.indexes i on t.object_id = i.object_id
    join sys.partition_schemes ps on i.data_space_id = ps.data_space_id
    join sys.partition_functions pf on ps.function_id = pf.function_id
    join sys.partitions p on i.object_id = p.object_id and i.index_id = p.index_id
    join sys.partition_range_values prv on pf.function_id = prv.function_id and p.partition_number = prv.boundary_id
where i.index_id < 2  --So we're only looking at a clustered index or heap, which the table is partitioned on
order by p.partition_number

My Partitioning Odyssey

Recently I had my first real experience with SQL Server partitioning. I had known about it, sure, but most of it was general understanding of the concepts and no real practical experience. Well, all of that changed last month when I successfully partitioned a table with several billion (with a “B”) rows to help get a grip on an application that was slipping out of control. I’d like to walk you through the process to try and help you with any partitioning you intend to do.

That’s no moon!

First off, let’s look at the problem. Like I said, we had a big table, but what does that mean? Well, here’s the hard stats from sp_spaceused:

19,600,710,213 rows
~759 GB reserved space
~432 GB data space
~326 GB index space
~120 MB free space

Yeah, pretty big right? I mean, at 759 GB, this one table was bigger than most databases that I’ve worked with. Structurally, it wasn’t all that complex, though, as it had 3 fields, a composite clustered primary key, and a non-clustered index. However, even with it being very narrow, the size of it prevented us from being able to do any maintenance. And it was almost impossible to properly identify and roll off any data, so our problem was only going to get worse.

How do you eat an elephant?

Based on some of my reading, I was pretty sure partitioning was our way out, but I needed more information. A couple weeks of research later, I came up with the following links that really helped me “get it”:

Partitioned Tables and Indexes in SQL Server 2005
Simple Talk-Partitioned Tables in SQL Server 2005
Kimberly Tripp Web Cast
SQLServerpedia Partitioning article

The first thing I got out of all these articles was why you would want to do partitioning on a table this size:

  • Ability to rebuild/manage indexes as each partition could be rebuild separately.
  • Increased performance for reads based on better data targeting by the query engine, as it should only retrieve needed partitions.
  • Increased performance on all queries, as we would be able to maintain the indexes on the table.
  • Possible performance increase on inserts.
  • Ability to better archive off data as the table grows, thus being able to overall manage space on the server.

So what partitioning gives you is a way to manage the table in small chunks instead as one large object. Or a better way to think of it is the old cliché: How do you eat an elephant?

One bite at a time.

Now, there were also some key ground rules I learned about partitioning:

  • Table partitioning defines how the table will be physically stored
    • If partitioning a heap, then the partition field can be any field in the table since no field is involved in defining how the table is physically stored on disk.
    • If partitioning a table with a clustered index (like a primary key), then the partitioning field must be one of the fields included in the index, as the clustered index defines how the table is physically stored on disk.
  • You are always partitioning by RANGE. If you wish to partition on specific values, then your range declaration will be of that value.
  • Each partition is a physically separate entity, meaning that you can allocate partitions to different file groups within SQL Server. This would be the method of explicitly assigning parts of a table to specific disk locations.
  • SQL 2005 and 2008 are limited to 1000 partitions.

The first point was pivotal for me, as I had seen a bunch of examples where some guys had partitioned on a table field, and some on a clustered index field. Understanding that table partitioning revolves around how the table is physically stored on disk really made it all clear for me.

Planning time!

Armed with this knowledge, it was time to build out my partition structure. The first step was to figure out what I was going to partition on. The table was designed to store a value based on a combination of an individual id, representing a person, and a field id, detailing what type of data was being stored for that individual. Reviewing the ground rules, I knew you could only partition on one field and it must be part of that clustered index, so I was limited to selecting one of those fields. Your partitioning doesn’t have to be the entire clustered index, so using just one of these fields was fine. As the individual id values was pretty granular(about 1.6 billion distinct values), I went with that as my partitioning field.

Next up, I needed to decide on my ranges. The individual id was a numeric, auto-incrementing value, so this made it easy to establish a numeric range. The real question was how large to make each partition, since I wanted them small for manageability, but not to small so I would go over the 1000 partition limit. I ended up deciding on making each partition 5 million distinct individual ID values, which would mean roughly 50 million rows per partition (the individual id to assigned fields ratio was about 1 to 10).

With these two decisions made, I was able to write the following SQL code to create my function, scheme, and table(names have been changed to protect the troublesome):

create partition function pf_bigtable (int)
as range left for values();

declare @boundary int
declare @increment int

set @boundary = 5000000
set @increment = 5000000

while @boundary < 2000000000
begin
	alter partition function pf_bigtable() split range (@boundary)
	set @boundary = @boundary+@increment
end

create partition scheme ps_bigtable
as partition pf_ bigtable
all to ([PRIMARY]);

--Create new table
CREATE TABLE [dbo].[bigtable_p](
	[individualid] [int] NOT NULL,
	[fieldid] [int] NOT NULL,
	[valueid] [int] NOT NULL,
 CONSTRAINT [pk_bigtable_p] PRIMARY KEY CLUSTERED
(
	[individualid] ASC,
	[fieldid] ASC
) ON ps_recipidsv(individualid)
--NC index (created later)
CREATE NONCLUSTERED INDEX [ix] ON [dbo].[bigtable_p]
(
	[fieldid] ASC,
	[valueid] ASC
)

Notice that I used a looping statement to add in all my ranges to save myself the effort of explicitly declaring several hundred partitions.

Getting it in place

The biggest hurdle was partitioning out the table itself. This table was part of an OLTP system that needed to be up almost 24/7. We had a weekly maintenance window of 3 hours one night in the week, but that wasn’t enough time to apply this operation to the live table. Fortunately, we did have plenty of disk space.

What I did was pretty straightforward. I went ahead and created the table with the fully created partition scheme as described above, but empty of data. I also held off creating the non-clustered index. At this point I had an empty copy of my live table. With the help of one of the other guys on my team, we built a batch insert process to step through the live table and insert small batches of records (~6 million at a time) into the partitioned table, so we could build up about 99% of the data from the live table to our partitioned copy. Once we had gotten the bulk of the data inserted, I then created the non-clustered index. When this was completed, we then took the application offline to stop writes to the table, ran a final insert process to get all the new records we had missed since we created the non-clustered index, and then changed the table and index names, swapping the unpartitioned table with the partitioned one.

The initial copy actually took several days to complete, but we were able to have uninterrupted application service during the whole time. When the non-clustered index build was finished, we ended up having only 30 minutes of application downtime to catch everything up and swap the objects.

The finish line

So now that we got all this into place, it was time for us to see what the partitioning bought us. The first item on my list was seeing if I could rebuild the index on the table. By making use of the dm_db_index_physical_stats dynamic management function, I was able to write a script to rebuild each individual partition as necessary:

declare recscan cursor for
SELECT partition_number
FROM sys.dm_db_index_physical_stats (DB_ID(),OBJECT_ID(N'dbo.bigtable'), NULL , NULL, NULL)
where avg_fragmentation_in_percent >= 30;

declare @sql varchar(max)
declare @partnum varchar(10)

open recscan
fetch next from recscan into @partnum
while @@fetch_status = 0
begin
set @sql = 'alter index all on bigtable_p rebuild partition='+@partnum
exec(@sql)
fetch next from recscan into @partnum
end

close recscan
deallocate recscan

The initial run of this still took 8+ hours to complete, so not any real time savings there. However, since most of the table wasn’t getting affected (it was mostly inserts, very few updates), subsequent maintenance only took 5-15 minutes each run because only a handful of partitions actually needed to be addressed.

We also saw dramatic performance in the use of the application. SELECTs of the data were much faster, and the load process that did most of the inserting into the table saw upwards of a 150% speed increase.

Overall, this was a huge win for me and a fantastic learning experience. You can never really put something in your toolbag until you’ve actually been able to do it, and it’s not every day a challenge like this comes your way. However, this experience should also stress the necessity of planning, because we had to jump through a lot of hoops to get the partitions on this table after it had already grown to an unmanageable size. A lot of that effort could have been saved if someone had given some more thought to the amount of data flowing into this system.

Anyway, there’s my partitioning odyssey. I hope this was beneficial to you and if there’s anything that wasn’t clear or may have been missed, don’t hesitate to drop me a note and let me know!

Monday Scripts – xp_logininfo

How many times have you wished you could use SQL Server to query Active Directory and see who is in that group? This week’s script will allow you to do that, so if you have AD groups set up for logging into your server you can see exactly which logins have access.

xp_logininfo is an extended stored procedure that does just that. By passing a couple parameters, you can easily query any AD group that is a server principal. Wrap it up in a dynamic SQL loop, and you can quickly and easily audit your server’s security.

declare @winlogins table
(acct_name sysname,
acct_type varchar(10),
act_priv varchar(10),
login_name sysname,
perm_path sysname)

declare @group sysname

declare recscan cursor for
select name from sys.server_principals
where type = 'G' and name not like 'NT%'

open recscan
fetch next from recscan into @group

while @@FETCH_STATUS = 0
begin
insert into @winlogins
exec xp_logininfo @group,'members'
fetch next from recscan into @group
end
close recscan
deallocate recscan

select
r.name,
u.name,
u.type_desc,
wl.login_name,
wl.acct_type
from (select * from sys.server_principals where type = 'R') r
join sys.server_role_members rm on (r.principal_id = rm.role_principal_id)
join (select * from sys.server_principals where type != 'R') u on rm.member_principal_id = u.principal_id
left join @winlogins wl on u.name = wl.perm_path
order by login_name,r.principal_id,u.type_desc,u.name

Looking at the script, I first do a looped scan through sys.server_principals for all Windows groups (type ‘G’) and collect their AD group info. I store that in a table variable, then join it back to a select from server_principals again for a full listing of all my server permissions. Bam! Instant security report.