Art of the DBA Rotating Header Image

February, 2013:

Powershell Shredding

I’ve been playing around a bit recently with Powershell and XML. It’s one of those expirements where I’m not sure what the immediate benefit is, but it certainly is interesting seeing what kind of functionality we have available to us as data folks. I’m going to see what more I can coax out of it, but I wanted to share with you what I’ve learned so far.

First off, understand that I’m not that strong when it comes to XML. I get what it is, I understand the basic structure, but wrangling it isn’t something I’ve had to do a lot of. As a result, I’m still very much a newbie with XPath and XQuery. I understand nodes and properties, but then it starts to get muddy. Just a disclaimer before we get to far into this.

.NET, and by extension Powershell, has an XML data type.  This is useful because query plans are XML documents, whether we save them off or stored in the plan cache. So it’s a fairly simply matter to suck a query plan into an XML variable:

[xml]$plan=(gc SomeSQLQuery.sqlplan)

From here, we can start browsing through our plan using the dot notation to parse the plan. The query plan itself is going to be found under the ShowPlanXML node. Under that, there’s a fairly complex layout that you can really dig into by looking at the full schema documentation. Suffice to say, if we want to see the SQL text from the query, we’d need to look at:

$plan.ShowPlanXML.BatchSequence.batch.Statements.StmtSimple.StatementText

That’s a lot of drilldown! What’s worse is, if we wanted to started finding specific operators, we would quickly get lost in a recursive arrangement of RelOp nodes and actual operators. So if we want to extract something useful out of the XML, we need to leverage XPath using the .SelectNodes() method of the XML data. The only problem here is that, for some reason, the namespace that is contained within the plan confuses any calls using .SelectNodes(). To get around this, I basically ripped out the namespace so that the defaults can be used:

[xml]$plan=(gc SomeSQLQuery.sqlplan) -replace 'xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"'

At this point, I can now start using XPath to analyze my query plan. So if I wanted to pull up all my table scans:

$plan.SelectNodes('//TableScan/Object') | ft

Or, if I wanted to get all my table or index scans:

$plan.SelectNodes('//*[contains(name(),"Scan")]/Object') | ft

And so and so forth.

Now, what does this get me? At this point, I’m not sure. I started down this road after seeing Jason Strate’s(@StrateSQL) presentation on shredding the plan cache with T-SQL. My thought process was that this might be an easier way to dissect the plan cache for useful information. In a way, I was right, because it was a little easier to grasp, but it also seems like it’s the long way around the horn to get at that information. I’ll continue to poke at it and see what I can coax out of it.

T-SQL Tuesday #39 – A PoSH Post #tsql2sday

This month’s T-SQL Tuesday is graciously hosted by Wayne Sheffield(@DBAWayne) and is a topic near to my heart: Powershell.  I’ve blogged about Powershell before, occasionally sharing some code.  I’m excited to share some more and fit it into the blog party.

earlgreyI hate clicking buttons in Windows.  It’s inefficient for me, especially considering I grew up typing commands at a prompt.  Also, I’m one of those lazy DBAs, so when I get a chance to kick off a script to do work while I’m drinking some Earl Grey, I’ll jump at it.  This is not only why I’m a fan of Powershell, but I’m also a huge proponent of unattended SQL installs.

In my environment, I have a six node failover cluster which hosts all of the company’s SQL instances.  This means that whenever I have to install a new instance to the cluster, I need to perform a cluster node install 5 times.  Suffice to say, this is tedious.  So I said to myself:

“Self, isn’t there a better way to do this?”

“Quite right!  Unattended installs are easy, we just have to run the node install once, take that .ini to the other machines, and we can do 4 command line installs!”

“I don’t know, self.  It’s not like much is changing between instance installs, so running setup twice (once for the full install and once for the first cluster node) still seems like to much work.  To the PoSH mobile!”

See, a lot of my scripting experience comes from the *nix world.  Search and replace logic is pretty easy using sed and awk.  The beauty of Powershell is that, since everything is an object, I don’t even have to mess with that funky syntax.  I can just leverage the .replace() method of a regular ol’ string variable.  This lead me to the following solution:

  • Create a template .ini file for adding a node.  This template contains all my basic node options and stubs for the replaceable values.
  • Write a Powershell script that takes that template and some inputs from me, does some basic search and replace, then spits out a custom .ini for my cluster node install.
  • Use that .ini for the /CONFIGURATIONFILE parameter in an install and *bam*.

I won’t bore you with the whole template file, you can download it and look at it here.  Where the real “magic” (such as it is) is the following(downloadable):


<#
.SYNOPSIS
   Creates an .ini for adding a cluster node based off of the add_node_template.ini
.DESCRIPTION
   Takes the add_node_template.ini and performs a text replace to insert passed values
	 for Server, Instance, and service account information.
.PARAMETER <paramName>
   -server - Server name of the virtual network name
	 -instance - Named instance name
	 -sqlsvc - SQL Server Service account
	 -sqlpwd - SQL Server Service password
	 -agtsvc - SQL Server Agent account
	 -agtpwd - SQL Server Agent password
.EXAMPLE
   .\Create-NodeIni.ps1 -server "FOO" -instance "BAR01" -sqlsvc "EMINENT\sqlsvc" -sqlpwd "password" -agtsvc "EMINENT\sqlasvc" -agtpwd "password"
#>

param([parameter(Mandatory=$true)][string] $server,
      [parameter(Mandatory=$true)][string] $instance,
			[parameter(Mandatory=$true)][string] $sqlsvc,
			[parameter(Mandatory=$true)][string] $sqlpwd,
			[parameter(Mandatory=$true)][string] $agtsvc,
			[parameter(Mandatory=$true)][string] $agtpwd)

$output="$server`_$instance`_addnode.ini"

(Get-Content "add_node_template.ini") | Foreach-Object{
	$_ -replace "<SRV>",$server `
		-replace "<INST>",$instance `
		-replace "<AGTACCT>",$agtsvc `
		-replace "<AGTPWD>",$agtpwd `
		-replace "<SQLACCT>",$sqlsvc `
		-replace "<SQLPWD>",$sqlpwd } | Set-Content $OUTPUT

While there’s a lot going on here, it’s the last seven lines where the work gets done. The Get-Content comandlet reads in the template file, then gets piped to the Foreach-Object commandlet to process each line of the file. This works because when you make a text file a string object in Powershell, it becomes a multi-value array where each line is an array member. Then the script executes several different replaces (wherever it finds matches for my stubs) on each member. The result is piped to the Set-Content commandlet, writing out the new post-replace array out to a new file.

It’s a simple trick, but with this I not only save myself a ton of time by generating files for SQL installs, I also make my installations consistent. Every node install is run with the same options and selections. Human error is reduced and increased tea time is provided. Future enhancements would be to update my templates so I can generate the initial SQL install .ini, but since I do so few instance installs, it’s not really worth the effort at this point. Plus, it looks like someone is already working on that.

Enjoy the rest of T-SQL Tuesday. Plenty of great people are contributing and the great thing about Powershell is there are so many ways it can be used. I hope that our scripts will not only get you excited about learning Powershell yourself, but give you some great ideas on how you can leverage it for your own success.

WHERE to JOIN?

I’ve been really enjoying the DBA StackExchange site recently.  Not only can you see what challenges and hurdles people have, the site construction gives people a great way to contribute to an ever expanding library of database solutions.  Questions range from the very simple to the highly esoteric, but in all cases the community comes together to groom both questions and answers in such a way that a comprehensive knowledge base is built for future use.

One of these questions that recently came up was:  Which performs better, creating your joins in the FROM clause or the WHERE clause?  Most people have been using the ANSI-92 syntax, so this question may seem a little odd, but I still see a lot of SQL code out there that uses the prior syntax where the joins are declared in the WHERE portion of your query.  If you want to read more, Mike Walsh(@Mike_Walsh) has a great post on how the syntax has evolved and how its changed in SQL Server 2012.

Back to the question, though.  Does it really make any difference?  Well, I could tell you straight out, but what sort of blog post would that make?  Instead, let’s test it out ourselves.  Using a basic schema, I’ve put together two very basic queries:

SELECT
  s_id
  ,s_desc
  ,b_desc
  ,f_desc
FROM
  snafu s
  ,bar b
  ,foo f
WHERE
  s.b_id = b.b_id
  AND b.f_id = f.f_id;

SELECT
  s_id
  ,s_desc
  ,b_desc
  ,f_desc
FROM
  snafu s
  INNER JOIN bar b ON (s.b_id = b.b_id)
  INNER JOIN foo f ON (b.f_id = f.f_id);

As you can see, the only real difference here is that in the first query we have our joins in the WHERE clause. The second follows ANSI-92 syntax and places the joins in the FROM clause. Now how do we tell if they perform differently? Query plans, of course!

Query 1 (WHERE clause)

WHERE_JOIN

Query 2 (FROM clause)

FROM_JOIN

Notice how both queries have exactly the same plan.  This is because our friend, the Optimizer, understands the two approaches and will build the plan accordingly.  Want to play with it yourself?  You can check out the full example over at SQL Fiddle.

There are three things I’d like you to take with you after this brief exercise:

  • Functionally, it doesn’t matter if you declare your JOINs in your FROM or your WHERE clause, the optimizer will treat both as the same.  However, if you read Mike Walsh’s blog post, you really should be using the ANSI-92 standard.  The “old” syntax only works if you have your database in SQL 2000 compatibility mode (which means it doesn’t work at all in SQL 2012).
  • Query plans will answer most of your performance questions regarding SQL syntax.  If you haven’t been looking at them, I strongly suggest you pick up Grant’s book and start checking those plans out.
  • I’ve only recently discovered SQL Fiddle, but this is a great tool for mocking up and testing concepts for databases.  I haven’t built anything larger than 2-3 tables, but for basic test cases and examples to demonstrate something, it’s really cool(it even lets you look at query plans!).  Check it out.