2011/04/21 - Art of the DBA

Recently I had my first real experience with SQL Server partitioning. I had known about it, sure, but most of it was general understanding of the concepts and no real practical experience. Well, all of that changed last month when I successfully partitioned a table with several billion (with a “B”) rows to help get a grip on an application that was slipping out of control. I’d like to walk you through the process to try and help you with any partitioning you intend to do.

That’s no moon!

First off, let’s look at the problem. Like I said, we had a big table, but what does that mean? Well, here’s the hard stats from sp_spaceused:

19,600,710,213 rows
~759 GB reserved space
~432 GB data space
~326 GB index space
~120 MB free space

Yeah, pretty big right? I mean, at 759 GB, this one table was bigger than most databases that I’ve worked with. Structurally, it wasn’t all that complex, though, as it had 3 fields, a composite clustered primary key, and a non-clustered index. However, even with it being very narrow, the size of it prevented us from being able to do any maintenance. And it was almost impossible to properly identify and roll off any data, so our problem was only going to get worse.

How do you eat an elephant?

Based on some of my reading, I was pretty sure partitioning was our way out, but I needed more information. A couple weeks of research later, I came up with the following links that really helped me “get it”:

Partitioned Tables and Indexes in SQL Server 2005
Simple Talk-Partitioned Tables in SQL Server 2005
Kimberly Tripp Web Cast
SQLServerpedia Partitioning article

The first thing I got out of all these articles was why you would want to do partitioning on a table this size:

Ability to rebuild/manage indexes as each partition could be rebuild separately.
Increased performance for reads based on better data targeting by the query engine, as it should only retrieve needed partitions.
Increased performance on all queries, as we would be able to maintain the indexes on the table.
Possible performance increase on inserts.
Ability to better archive off data as the table grows, thus being able to overall manage space on the server.

So what partitioning gives you is a way to manage the table in small chunks instead as one large object. Or a better way to think of it is the old cliché: How do you eat an elephant?

One bite at a time.

Now, there were also some key ground rules I learned about partitioning:

Table partitioning defines how the table will be physically stored
- If partitioning a heap, then the partition field can be any field in the table since no field is involved in defining how the table is physically stored on disk.
- If partitioning a table with a clustered index (like a primary key), then the partitioning field must be one of the fields included in the index, as the clustered index defines how the table is physically stored on disk.
You are always partitioning by RANGE. If you wish to partition on specific values, then your range declaration will be of that value.
Each partition is a physically separate entity, meaning that you can allocate partitions to different file groups within SQL Server. This would be the method of explicitly assigning parts of a table to specific disk locations.
SQL 2005 and 2008 are limited to 1000 partitions.

The first point was pivotal for me, as I had seen a bunch of examples where some guys had partitioned on a table field, and some on a clustered index field. Understanding that table partitioning revolves around how the table is physically stored on disk really made it all clear for me.

Planning time!

Armed with this knowledge, it was time to build out my partition structure. The first step was to figure out what I was going to partition on. The table was designed to store a value based on a combination of an individual id, representing a person, and a field id, detailing what type of data was being stored for that individual. Reviewing the ground rules, I knew you could only partition on one field and it must be part of that clustered index, so I was limited to selecting one of those fields. Your partitioning doesn’t have to be the entire clustered index, so using just one of these fields was fine. As the individual id values was pretty granular(about 1.6 billion distinct values), I went with that as my partitioning field.

Next up, I needed to decide on my ranges. The individual id was a numeric, auto-incrementing value, so this made it easy to establish a numeric range. The real question was how large to make each partition, since I wanted them small for manageability, but not to small so I would go over the 1000 partition limit. I ended up deciding on making each partition 5 million distinct individual ID values, which would mean roughly 50 million rows per partition (the individual id to assigned fields ratio was about 1 to 10).

With these two decisions made, I was able to write the following SQL code to create my function, scheme, and table(names have been changed to protect the troublesome):

create partition function pf_bigtable (int)
as range left for values();

declare @boundary int
declare @increment int

set @boundary = 5000000
set @increment = 5000000

while @boundary < 2000000000
begin
	alter partition function pf_bigtable() split range (@boundary)
	set @boundary = @boundary+@increment
end

create partition scheme ps_bigtable
as partition pf_ bigtable
all to ([PRIMARY]);

--Create new table
CREATE TABLE [dbo].[bigtable_p](
	[individualid] [int] NOT NULL,
	[fieldid] [int] NOT NULL,
	[valueid] [int] NOT NULL,
 CONSTRAINT [pk_bigtable_p] PRIMARY KEY CLUSTERED
(
	[individualid] ASC,
	[fieldid] ASC
) ON ps_recipidsv(individualid)
--NC index (created later)
CREATE NONCLUSTERED INDEX [ix] ON [dbo].[bigtable_p]
(
	[fieldid] ASC,
	[valueid] ASC
)

Notice that I used a looping statement to add in all my ranges to save myself the effort of explicitly declaring several hundred partitions.

Getting it in place

The biggest hurdle was partitioning out the table itself. This table was part of an OLTP system that needed to be up almost 24/7. We had a weekly maintenance window of 3 hours one night in the week, but that wasn’t enough time to apply this operation to the live table. Fortunately, we did have plenty of disk space.

What I did was pretty straightforward. I went ahead and created the table with the fully created partition scheme as described above, but empty of data. I also held off creating the non-clustered index. At this point I had an empty copy of my live table. With the help of one of the other guys on my team, we built a batch insert process to step through the live table and insert small batches of records (~6 million at a time) into the partitioned table, so we could build up about 99% of the data from the live table to our partitioned copy. Once we had gotten the bulk of the data inserted, I then created the non-clustered index. When this was completed, we then took the application offline to stop writes to the table, ran a final insert process to get all the new records we had missed since we created the non-clustered index, and then changed the table and index names, swapping the unpartitioned table with the partitioned one.

The initial copy actually took several days to complete, but we were able to have uninterrupted application service during the whole time. When the non-clustered index build was finished, we ended up having only 30 minutes of application downtime to catch everything up and swap the objects.

The finish line

So now that we got all this into place, it was time for us to see what the partitioning bought us. The first item on my list was seeing if I could rebuild the index on the table. By making use of the dm_db_index_physical_stats dynamic management function, I was able to write a script to rebuild each individual partition as necessary:

declare recscan cursor for
SELECT partition_number
FROM sys.dm_db_index_physical_stats (DB_ID(),OBJECT_ID(N'dbo.bigtable'), NULL , NULL, NULL)
where avg_fragmentation_in_percent >= 30;

declare @sql varchar(max)
declare @partnum varchar(10)

open recscan
fetch next from recscan into @partnum
while @@fetch_status = 0
begin
set @sql = 'alter index all on bigtable_p rebuild partition='+@partnum
exec(@sql)
fetch next from recscan into @partnum
end

close recscan
deallocate recscan

The initial run of this still took 8+ hours to complete, so not any real time savings there. However, since most of the table wasn’t getting affected (it was mostly inserts, very few updates), subsequent maintenance only took 5-15 minutes each run because only a handful of partitions actually needed to be addressed.

We also saw dramatic performance in the use of the application. SELECTs of the data were much faster, and the load process that did most of the inserting into the table saw upwards of a 150% speed increase.

Overall, this was a huge win for me and a fantastic learning experience. You can never really put something in your toolbag until you’ve actually been able to do it, and it’s not every day a challenge like this comes your way. However, this experience should also stress the necessity of planning, because we had to jump through a lot of hoops to get the partitions on this table after it had already grown to an unmanageable size. A lot of that effort could have been saved if someone had given some more thought to the amount of data flowing into this system.

Anyway, there’s my partitioning odyssey. I hope this was beneficial to you and if there’s anything that wasn’t clear or may have been missed, don’t hesitate to drop me a note and let me know!

April 21st, 2011:

My Partitioning Odyssey

That’s no moon!

How do you eat an elephant?

Planning time!

Getting it in place

The finish line

Search

Post Archive

Disclaimer

Stack Exchange

GitHub

SQL Server

I’m tweeting!

Categories

Archives

Calendar

Pages

Meta

April 21st, 2011:

My Partitioning Odyssey

That’s no moon!

How do you eat an elephant?

Planning time!

Getting it in place

The finish line

Share and Enjoy:

Search

Post Archive

Disclaimer

Stack Exchange

GitHub

SQL Server

I’m tweeting!

Tag Cloud

Categories

Archives

Calendar

Pages

Meta