So last week we had to puzzle out a little weirdness at work. One of our development teams is working to implement partitioning and they came to us about their scheme. They had applied a partitioning scheme, but it wasn’t being used by their queries. Their scheme was pretty simple, but the oddity was that they had partitioned on a nvarchar(10) field. Best practice is to use an integer, though I’ve seen date fields used as well. Though I knew that you could partition on a character field, I usually avoided it.
Now, using an nvarchar field shouldn’t have altered the query process that much, right? This is where it got odd, because what we were seeing was that a query on that partition scheme was still querying across all partitions. Here’s the query and what we were seeing in the query plan:
One of the advantages of partitioning is to improve query performance with range scans, so that the engine will only query across partitions. However, even though our query was specifically using the partition key, the engine was still querying across all partitions for its result. And we were specifically using a nvarchar value for the query criteria, so it shouldn’t be doing any sort of conversion, right? Well, that’s where you (and we, as well) would be wrong. We tried the query again, this time with an explicit conversion to nvarchar(10):
Bang! By converting it to the same datatype as the partition key, the query engine can now properly take advantage of our partitioning. The results are further reinforced by looking at the query costs of our two queries:
- Query 1: 0.0131291
- Query 2: 0.0032831
What just happened?
What’s going on behind the scenes is that in query 1 our predicate datatype does not match the datatype of our partition key. The SQL Server engine sees the query predicate of CUSTOMERID=N’2’ as an nvarchar(4000), not as an nvarchar(10). Just take away the “N” in front of our string value and have it declared as a character string and force an implicit conversion. Then, if you look at the query operator, you’ll see SQL Server do a CONVERT_IMPLICIT to nvarchar(4000), not a nvarchar(10).
I also went through this exercise using varchar as my datatype and got the same results. If I simply declared the predicate as CUSTOMERID=’2’, my query would seek across all partitions. If I used CUSTOMERID=convert(varchar(10),’2’), then I would only have the one active partition.
I’m not sure if this is a bug or working as intended, but there’s definitely a couple of lessons we can take from this:
- To best take advantage of querying across partitions, you need to make sure your query criteria is explicitly the same datatype as your partition key. Otherwise, the engine can’t make use of your partitioning.
- Obviously character strings have some weirdness about them for partitioning, so it’s best to avoid them for partition keys.
- Mike is weird for geeking out about this stuff. Seriously. My co-workers gave me some odd looks as we were puzzling this out.