Art of the DBA Rotating Header Image

query plans

Powershell Shredding

I’ve been playing around a bit recently with Powershell and XML. It’s one of those expirements where I’m not sure what the immediate benefit is, but it certainly is interesting seeing what kind of functionality we have available to us as data folks. I’m going to see what more I can coax out of it, but I wanted to share with you what I’ve learned so far.

First off, understand that I’m not that strong when it comes to XML. I get what it is, I understand the basic structure, but wrangling it isn’t something I’ve had to do a lot of. As a result, I’m still very much a newbie with XPath and XQuery. I understand nodes and properties, but then it starts to get muddy. Just a disclaimer before we get to far into this.

.NET, and by extension Powershell, has an XML data type.  This is useful because query plans are XML documents, whether we save them off or stored in the plan cache. So it’s a fairly simply matter to suck a query plan into an XML variable:

[xml]$plan=(gc SomeSQLQuery.sqlplan)

From here, we can start browsing through our plan using the dot notation to parse the plan. The query plan itself is going to be found under the ShowPlanXML node. Under that, there’s a fairly complex layout that you can really dig into by looking at the full schema documentation. Suffice to say, if we want to see the SQL text from the query, we’d need to look at:

$plan.ShowPlanXML.BatchSequence.batch.Statements.StmtSimple.StatementText

That’s a lot of drilldown! What’s worse is, if we wanted to started finding specific operators, we would quickly get lost in a recursive arrangement of RelOp nodes and actual operators. So if we want to extract something useful out of the XML, we need to leverage XPath using the .SelectNodes() method of the XML data. The only problem here is that, for some reason, the namespace that is contained within the plan confuses any calls using .SelectNodes(). To get around this, I basically ripped out the namespace so that the defaults can be used:

[xml]$plan=(gc SomeSQLQuery.sqlplan) -replace 'xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"'

At this point, I can now start using XPath to analyze my query plan. So if I wanted to pull up all my table scans:

$plan.SelectNodes('//TableScan/Object') | ft

Or, if I wanted to get all my table or index scans:

$plan.SelectNodes('//*[contains(name(),"Scan")]/Object') | ft

And so and so forth.

Now, what does this get me? At this point, I’m not sure. I started down this road after seeing Jason Strate’s(@StrateSQL) presentation on shredding the plan cache with T-SQL. My thought process was that this might be an easier way to dissect the plan cache for useful information. In a way, I was right, because it was a little easier to grasp, but it also seems like it’s the long way around the horn to get at that information. I’ll continue to poke at it and see what I can coax out of it.

WHERE to JOIN?

I’ve been really enjoying the DBA StackExchange site recently.  Not only can you see what challenges and hurdles people have, the site construction gives people a great way to contribute to an ever expanding library of database solutions.  Questions range from the very simple to the highly esoteric, but in all cases the community comes together to groom both questions and answers in such a way that a comprehensive knowledge base is built for future use.

One of these questions that recently came up was:  Which performs better, creating your joins in the FROM clause or the WHERE clause?  Most people have been using the ANSI-92 syntax, so this question may seem a little odd, but I still see a lot of SQL code out there that uses the prior syntax where the joins are declared in the WHERE portion of your query.  If you want to read more, Mike Walsh(@Mike_Walsh) has a great post on how the syntax has evolved and how its changed in SQL Server 2012.

Back to the question, though.  Does it really make any difference?  Well, I could tell you straight out, but what sort of blog post would that make?  Instead, let’s test it out ourselves.  Using a basic schema, I’ve put together two very basic queries:

SELECT
  s_id
  ,s_desc
  ,b_desc
  ,f_desc
FROM
  snafu s
  ,bar b
  ,foo f
WHERE
  s.b_id = b.b_id
  AND b.f_id = f.f_id;

SELECT
  s_id
  ,s_desc
  ,b_desc
  ,f_desc
FROM
  snafu s
  INNER JOIN bar b ON (s.b_id = b.b_id)
  INNER JOIN foo f ON (b.f_id = f.f_id);

As you can see, the only real difference here is that in the first query we have our joins in the WHERE clause. The second follows ANSI-92 syntax and places the joins in the FROM clause. Now how do we tell if they perform differently? Query plans, of course!

Query 1 (WHERE clause)

WHERE_JOIN

Query 2 (FROM clause)

FROM_JOIN

Notice how both queries have exactly the same plan.  This is because our friend, the Optimizer, understands the two approaches and will build the plan accordingly.  Want to play with it yourself?  You can check out the full example over at SQL Fiddle.

There are three things I’d like you to take with you after this brief exercise:

  • Functionally, it doesn’t matter if you declare your JOINs in your FROM or your WHERE clause, the optimizer will treat both as the same.  However, if you read Mike Walsh’s blog post, you really should be using the ANSI-92 standard.  The “old” syntax only works if you have your database in SQL 2000 compatibility mode (which means it doesn’t work at all in SQL 2012).
  • Query plans will answer most of your performance questions regarding SQL syntax.  If you haven’t been looking at them, I strongly suggest you pick up Grant’s book and start checking those plans out.
  • I’ve only recently discovered SQL Fiddle, but this is a great tool for mocking up and testing concepts for databases.  I haven’t built anything larger than 2-3 tables, but for basic test cases and examples to demonstrate something, it’s really cool(it even lets you look at query plans!).  Check it out.