Art of the DBA Rotating Header Image

Process of Elimination

I never expected to talk as much about partitioning as I have so far this year. The tally is currently at six, with at least two more in the works. It’s a hot topic due to partitioning still being a mystery to a lot of folks and I’m glad I’ve been able to pull back the curtain a little bit.  However, there’s still some things not understood very well.

One of these is partition elimination. Heck, up until I really started studying partitioning, I didn’t understand it very well either. Just like many others out there, I thought that once you partitioned a table, all your queries would suddenly become faster because the engine would only look at the needed partitions. While SQL Server tries to do this, the reality of when partition elimination actually occurs is not quite as cool.

To figure out how partition elimination works, we only have to look at our query plans. Let’s crack open the demo I use in my partitioning presentation and take a look. As a quick recap, I use a basic orders table that is partitioned on an incrementing integer primary key:

Now let’s run a simple SELECT to query data out and capture the query plan that results:

select * from orders
where order_id between 11000 and 15000

Nothing complex here and nothing complex in the query plan. You’ll see the expected clustered index seek. What’s key is to look at the properties (F4) of your operator to see how your partitions are being used:

These two property values give us all the answers when it comes to partition elimination. First, Partition Count tells us that the engine only retrieved a single partition for our query. Actual Partitions gives us the partitions (in this case, partition 3) that were actually accessed. Great! In this case, by querying order_id, we were able to take advantage of our partitioning.

What if we use a different field, will SQL still try to take advantage of the partitioning? It depends on whether or not the partitioning key is used in any part of the query. We’ll go ahead and add an index on order_date (which will be partitioned aligned) and query to get the total order amount for date range:

select sum(order_amount)
from orders
where order_date between '1/1/2012' and '2/1/2012'

Now, this plan is a little more interesting, so let’s take a look at it, focusing on the two places where we hit the table:

You’ll note that the query has to use a key lookup to complete the query. This is because we have an index on order_date, but it doesn’t include the order amount, so the engine has to go back to the table using the clustered index to find our data. Let’s now look at the properties for each of these operators:

Where are the partition properties for the index seek? Since the query can’t make use of partitioning for this part of the query, the properties aren’t listed. However, the key lookup (which makes use of the clustered index and, therefore, our partitioning scheme) does show that we accessed only partition number 3.

What if we go a step further and alter our index on order_date to include order_amount, removing the need to do a key lookup in the query? Let’s take a look:

By detailing that the query accessed eight partitions (1..8), we see that the engine tried to use the partitioning.  Since the partitioning key wasn’t directly involved, it ended up seeking across the entire index anyway.

To wrap it all up, let’s look at the performance of our order_amount queries:

  • Query cost on index without include: 2.63403
  • Query cost on index with include: 0.0276018

So even without using our partitioning, including the order_amount is still much faster than having to do a key lookup.

Overall, partition elimination can be a helpful tool in the engine’s toolbox. As your data sets get larger and larger, it can improve your performance, but only if you understand how the engine uses partitions. Sometimes, however, it’s better to take a different approach.  Don’t go and rewrite all your queries simply to get your partitioning key involved, because it may not help. Always do a careful analysis of what’s going on under the covers before making any tuning decisions.

2 Comments

  1. Shaun says:

    It’s been my understanding that you’d want to partition based on some field that is meaningful in the real world – i.e., not order_ID. I’d think partitioning on order_date in this case would be the way to go. In fact, partitioning on a date field is typically what people do to take advantage of partition switching to move historical data from active tables to reporting tables quickly. However, I will admit my knowledge is purely theoretical. I have not played with partitioning yet.

  2. [...] Process of Elimination - A closer look at Partitioning from Mike Fal (Blog|Twitter). [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>