Art of the DBA Rotating Header Image

SQL Server

Server Inventories

Ok, I’ve been promising this to people for a while, time to get at it.  Brace yourself for a long one.

 As enterprise DBA’s, we usually have many servers to manage.  Whether it’s around ten, a hundred, or a thousand (or more!), tracking this manually is a bear.  However, there are still many shops maintain convoluted spreadsheets and other manual documentation to track their inventory.  Manual sucks.  As John Sansom(@SQLBrit) says, the best DBAs automate everything, including their server inventories.

 There’s a lot of approaches and third party tools you can use for your own environment.  In my mind, I’m totally cool with spending the money on a third party tool because of the depth they can provide in my tracking and monitoring.  A lot of times, though, we might not have access to those tools or we may need more information than those tools provide.  It’s at this point we sit down at our keyboard, crack our fingers, and start banging out some code.  Of course, the code I started banging out was Powershell (shocking, I know!).

 I didn’t start from scratch, though.  When putting my own script together (which we’ll get to later), I found two great resources that got me about 80% of the way:

 Allen White(@SQLRunr) – Let PowerShell do an Inventory of your Servers
Colleen Morrow(@ClevelandDBA) – Building A SQL Server Inventory

These are both great resources that will get you started, but I needed a little more.  The biggest challenge I had was a need to poll cluster information and collect SQL instance information separate from my machine information.  I also wanted to do some consolidation of my data sets.  So while I borrowed heavily from both Allen and Colleen, but then molded it to my own purposes.

Process

Before we dig into the code, I first want to touch on the collection process and how I handle it.  It’s not complex, but code doesn’t make sense until you understand the “why” of the different components.

What I’m collecting

  • SQL Instance information – Whether it’s an instance living on a cluster node, a physical stand alone, or a VM host, I want the SQL Instance without really caring about the machine it lives on.  At least not meshed in directly with the machine, though I DO collect the physical host name so I can identify which machine information I need.

  • Physical Machine information – Since I’m not tying the machine info directly with the SQL Instance collection, I want the physical information for the servers my instances live on.  This is where I get everything that isn’t tied directly to the SQL Instance.

How I’m collecting it

  • I’ve got 4 tables, 2 live tables and 2 stage tables, that I use for the collection.  Nothing out of the ordinary, but there’s an instance table with a stage table and a machine table with a stage table.

  • I use a Powershell script to perform the collection itself.  It uses instance table as its primary driver, where the idea is that I provide the initial population of instance names to track into the table and the inventory process will fully populate the rest of the information.

  • When the Powershell scripts complete, there is a stored procedure it executes to load data from the stage tables into the live tables.  The load is simple, where the instance table is updated with data from the stage and the machine information is deleted/replaced.

  • The Powershell script is run by a SQL Agent job running under a specific monitoring Active Directory account created as a credential in SQL Server.  The gotcha here is that the monitoring account needs domain access to the clusters and machines it’s going to be querying.

The Code

I won’t list the full scripts here, but you can download and review them here:

SQL Objects
Powershell script

 Let’s first talk about the dependencies:

  • The SMO, so you’re going to need to have either your script or your profile load the 2008 snap-ins or import the 2012 sqlps module.

  • Chad Miller’s Out-DataTable and Write-DataTable (I combined them into one file, DataTables.ps1).

  • The FailOverClusters module, part of the Remote Admin pack if you’re not running the code on a machine with the Windows Server OS, part of the Fail Over Cluster components if you are.

There are two key functions I use, Get-Instance to gather the SQL instance information and Get-Machine to gather the machine information:

#Collects and returns SQL Instance information
function Get-Instance([string]$instcoll,[int]$id,[string]$name)
{
try
{
	$smo = new-object ('Microsoft.SqlServer.Management.Smo.Server') $name
	$sname = $smo.NetName
	$iname = $smo.InstanceName
	if($iname.Length -eq 0 -or $iname -eq $null) { $iname = "MSSQLSERVER" }

	$managedcomp = new-object ('Microsoft.SqlServer.Management.Smo.WMI.ManagedComputer') $sname
	$output = New-Object System.Object

	$port = $managedcomp.ServerInstances[$iname].ServerProtocols["Tcp"].IPAddresses["IPAll"].IPAddressProperties["TcpPort"].Value
	$ip = (Test-Connection $sname -count 1).IPV4Address.ToString()

	$output | Add-Member -type NoteProperty -name InstanceId -value $id
	$output | Add-Member -type NoteProperty -name SQLVersion -value $smo.VersionString
	$output | Add-Member -type NoteProperty -name SQLVersionDesc -value $smo.ProductLevel
	$output | Add-Member -type NoteProperty -name SQLEdition -value $smo.Edition
	$output | Add-Member -type NoteProperty -name MemoryMinMB -value $smo.Configuration.MinServerMemory.RunValue
	$output | Add-Member -type NoteProperty -name MemoryMaxMB -value $smo.Configuration.MaxServerMemory.RunValue
	$output | Add-Member -type NoteProperty -name MAXDOPVal -value $smo.Configuration.MaxDegreeOfParallelism.RunValue
	$output | Add-Member -type NoteProperty -name IP -value $ip
	$output | Add-Member -type NoteProperty -name Port -value $port
	$output | Add-Member -type NoteProperty -name PhysicalHost -value $smo.ComputerNamePhysicalNetBIOS
	return $output
}
catch
{
	write-host "Error collecting $name"
	return $null
}
}#Get-Instance

#Get host machine information via WMI
function Get-Machine([string]$name,[string]$clst)
{
 try
{
	$comp = gwmi Win32_ComputerSystem -Computer $name | select Model,Manufacturer,TotalPhysicalMemory
	$proc = gwmi Win32_Processor -Computer $name | select NumberOfLogicalProcessors,MaxClockSpeed
	$os = gwmi Win32_OperatingSystem -Computer $name | select OSArchitecture,Name,Version,ServicePackMajorVersion,ServicePackMinorVersion

	$output = New-Object System.Object

	$output | Add-Member -type NoteProperty -name MachineName -value $name
	$output | Add-Member -type NoteProperty -name Model -value $comp.Model
	$output | Add-Member -type NoteProperty -name Manufacturer -value $comp.Manufacturer
	$output | Add-Member -type NoteProperty -name Architechture -value $os.OSArchitecture
	$output | Add-Member -type NoteProperty -name PhysicalCPUs -value $(if(!$proc.Length){"1"}else{$proc.Length})
	$output | Add-Member -type NoteProperty -name LogicalCPUs -value ($proc | Measure-Object NumberOfLogicalProcessors -sum).Sum
	$output | Add-Member -type NoteProperty -name CPUSpeed -value ($proc | Measure-Object MaxClockSpeed -max).Maximum
	$output | Add-Member -type NoteProperty -name MaxMemory -value ($comp.TotalPhysicalMemory/1MB)
	$output | Add-Member -type NoteProperty -name OSName -value $os.name.split("|")[0]
	$output | Add-Member -type NoteProperty -name OsVersion -value $os.Version
	$SPMaj = $os.ServicePackMajorVersion
	$SPMin = $os.ServicePackMinorVersion
	$output | Add-Member -type NoteProperty -name SPVersion -value "$SPMaj.$SPMin"
	$output | Add-Member -type NoteProperty -name Cluster -value $clst

	return $output
}
catch
{
	write-host "Error collecting $name"
	return $null
}
}#Get-Machine

Both leverage the SMO and the WMI to get relevant info.  I’ll let you sort through the individual elements I’m pulling, it’s fairly standard config info.  Data gathering, because I think in terms of tables, is a collection of objects that are essentially rows.  I append each new object with all the appropriate properties to a larger collection, then use Write-Datatable to push that data directly into the related stage table.

Look at the entire script for my full process.  I’ve been pretty happy with it and it’s been chugging away in my environments now for about 8-9 months without a whole lot of interaction from me.  If we stand up a new server, then I just pop that name in the instance table, run the job, and everything gets updated.  What made me most proud was that we had a recent meeting with a new VP and pulled up the SSRS report built off of this information to give him an idea of what our environments.  His comment was “This is the most documentation I’ve seen since I’ve got here.”  The beauty is that it was provided without a whole lot of manual maintenance on my part.

This is very much an evolving work on my part, but hopefully gives you some insight into how I manage my environments.  I’ve still got lots of ideas on improvements, including automatic population (or other use) of Central Management Server.  If you have any ideas on how this can be improved, I’d love to hear them.

Slipstreamin’ Away

It’s nothing new: I love anything that makes less work for me.  One thing that has been a great contributor to this has been slipstream installs.  While that’s a bit of a fancy name, basically it’s an install of SQL Server where the service packs and cumulative updates are rolled up into one install package.  This saves the tedious chore of constant install after install to get a new SQL Server installation up to a patch level consistent with the rest of your enterprise.

Now, the problem with slipstream installs is that they’re a pain in the butt to create, at least previous to SQL 2012.  Not difficult, but tedious.  You have to copy some files, merge some directories, and edit some config files.  It’s nothing that can’t be done, but very repetitive.  To boot, if something isn’t copied or configured right, the whole package won’t work and you won’t know until you attempt the install (and it fails).

Enter Powershell.

Building a slipstream is no different than any other task that we should automate.  After all, we’re doing the same actions every time, the only thing that might change are the files we use to build our slipstream.  What I’ve done is put together a script that does all this for me.  As with any script, there are a few rules to how you use it:

  • The script was designed for building SQL 2008 and 2008 R2 slipstreams.  I tried it with 2005, but that process is different enough that this script won’t work.
  • RTMSource and output are the two mandatory parameters.  The SPFiles and CUFiles parameters will be skipped if left null.  Basically, if you are making just a SQL 2008 R2 SP2 slipstream, you would only need to supply the service pack files.
  • The script expects the base install (the RTM media) to be un-extracted and the service packs or cumulative updates in their .exe form.  The script will extract the additional files from the .exe into the slipstream package.
  • I built the script to work along my organizational thought process, which means I put each set of files in their own subdirectory.  This shouldn’t be a big deal, but be aware that if you point the script at a location, it won’t just pull out the files it needs, it will try and use every .exe file in the subdirectory.

Other than that, here’s the script:

<#
.SYNOPSIS
   Builds a SQL Server slipstream install.
.DESCRIPTION
   This script will take the various components of a SQL Server isntall (RTM binaries, SPs, CUs) and assemble
	 them to create a slipstream install.  To use this script, you will need the unextracted RTM binaries (.iso
	 or DVD media) and the executables for the SP and CUs you want to use.
	 Mike Fal (htp://www.mikefal.net) 2-28-2013
.PARAMETER <paramName>
   RTMSource - File path for the RTM source files
	 SPSource - File path for the Service Pack executables (if no value is passed, no SP will be added)
	 CUSource - File path for the Cumulative update executables (if no value is passed, no CU will be added)
	 output - filepath where slipstream will be written, directory will be created if it doesn't exist.
.EXAMPLE
   .\Build-Slipstream.ps1 -RTMSource "C:\Users\mfal\Downloads\en_sql_server_2008_r2_developer_x86_x64_ia64_dvd_522665"
	 												-SPSource "C:\Users\mfal\Downloads\2008R2SP1"
													-CUSource "C:\Users\mfal\Downloads\2008R2SP1CU2"
													-output "C:\SQL2008SP1CU2_Slipstream"
#>

param([parameter(Mandatory=$true)][string] $RTMSource,
			[parameter(Mandatory=$false)][string] $SPSource,
      [parameter(Mandatory=$false)][string] $CUSource,
      [parameter(Mandatory=$true)][string] $output)

#function to extract and copy SPs and CUs
function Modify-Source{
	param($TYPE="PCU",$WORKDIR,$PATCHDIR)
	$PATCHFILES=get-childitem "$PATCHDIR\*" -include *.exe
	if (!(test-path("$WORKDIR\$TYPE"))) {mkdir $WORKDIR\$TYPE }
	if (!(test-path("$WORKDIR\$TYPE"))) {Throw "Error: Patch file destination not available"}

	set-location $PATCHDIR

	foreach($FILE in $PATCHFILES)
		{

			iex "./$($FILE.basename)$($FILE.extension) /x:""$WORKDIR\$TYPE"" /q"
			"Waiting for $FILE to extract..."
			while (@(Get-Process $FILE.Name.Replace(".exe","") -ErrorAction SilentlyContinue).Count -ne 0)
			{
			Start-Sleep 1
			}
		}
  "Copying $TYPE files..."
	robocopy "$WORKDIR\$TYPE" "$WORKDIR" setup.exe /ndl /nfl /njh /njs
	robocopy "$WORKDIR\$TYPE" "$WORKDIR" setup.rll /ndl /nfl /njh /njs

if (test-path("$WORKDIR\$TYPE\x86")) {robocopy "$WORKDIR\$TYPE\x86" "$WORKDIR\x86" /XF Microsoft.SQL.Chainer.PackageData.dll /ndl /nfl}
if (test-path("$WORKDIR\$TYPE\x64")) {robocopy "$WORKDIR\$TYPE\x64" "$WORKDIR\x64" /XF Microsoft.SQL.Chainer.PackageData.dll /ndl /nfl}
if (test-path("$WORKDIR\$TYPE\ia64")) {robocopy "$WORKDIR\$TYPE\ia64" "$WORKDIR\ia64" /XF Microsoft.SQL.Chainer.PackageData.dll /ndl /nfl}

set-location $WORKDIR
}

#Test path locations for validity

if (!(test-path($RTMSource + "\setup.exe"))) {Throw "SQL RTM Source does not exist!"}
if (!(test-path($SPSource)) -and !($SPSource -eq $null)) {Throw "Invalid Service Pack file location!"}
if (!(test-path($CUSource)) -and !($CUSource -eq $null)) {Throw "Invalid Cumulative Update file location!"}
if (!(test-path($output))) {mkdir $output}
if (!(test-path($output))) {Throw "Unable to create build directory!"}

#begin merge process, RTM
"Copying RTM..."
robocopy "$RTMSource" "$output" /s /ndl /nfl
$startloc=pwd
set-location $output

Copy-Item ".\x86\DefaultSetup.ini" ".\"

#merge Service Pack
if(!($SPSource -eq $null))
{
	Modify-Source "PCU" $output $SPSource
	"`n`nPCUSOURCE=`".\PCU`" `n" |Out-File "DefaultSetup.ini" -Append
}

#merge Cumulative Update
if(!($CUSource -eq $null))
{
	Modify-Source "CU" $output $CUSource
	"`n`nCUSOURCE=`".\CU`" `n" |Out-File "DefaultSetup.ini" -Append
}

#cleanup
copy-item "defaultsetup.ini" .\ia64 -force
copy-item "defaultsetup.ini" .\x64 -force
copy-item "defaultsetup.ini" .\x86 -force

Set-Location $startloc

T-SQL Tuesday #39 – A PoSH Post #tsql2sday

This month’s T-SQL Tuesday is graciously hosted by Wayne Sheffield(@DBAWayne) and is a topic near to my heart: Powershell.  I’ve blogged about Powershell before, occasionally sharing some code.  I’m excited to share some more and fit it into the blog party.

earlgreyI hate clicking buttons in Windows.  It’s inefficient for me, especially considering I grew up typing commands at a prompt.  Also, I’m one of those lazy DBAs, so when I get a chance to kick off a script to do work while I’m drinking some Earl Grey, I’ll jump at it.  This is not only why I’m a fan of Powershell, but I’m also a huge proponent of unattended SQL installs.

In my environment, I have a six node failover cluster which hosts all of the company’s SQL instances.  This means that whenever I have to install a new instance to the cluster, I need to perform a cluster node install 5 times.  Suffice to say, this is tedious.  So I said to myself:

“Self, isn’t there a better way to do this?”

“Quite right!  Unattended installs are easy, we just have to run the node install once, take that .ini to the other machines, and we can do 4 command line installs!”

“I don’t know, self.  It’s not like much is changing between instance installs, so running setup twice (once for the full install and once for the first cluster node) still seems like to much work.  To the PoSH mobile!”

See, a lot of my scripting experience comes from the *nix world.  Search and replace logic is pretty easy using sed and awk.  The beauty of Powershell is that, since everything is an object, I don’t even have to mess with that funky syntax.  I can just leverage the .replace() method of a regular ol’ string variable.  This lead me to the following solution:

  • Create a template .ini file for adding a node.  This template contains all my basic node options and stubs for the replaceable values.
  • Write a Powershell script that takes that template and some inputs from me, does some basic search and replace, then spits out a custom .ini for my cluster node install.
  • Use that .ini for the /CONFIGURATIONFILE parameter in an install and *bam*.

I won’t bore you with the whole template file, you can download it and look at it here.  Where the real “magic” (such as it is) is the following(downloadable):


<#
.SYNOPSIS
   Creates an .ini for adding a cluster node based off of the add_node_template.ini
.DESCRIPTION
   Takes the add_node_template.ini and performs a text replace to insert passed values
	 for Server, Instance, and service account information.
.PARAMETER <paramName>
   -server - Server name of the virtual network name
	 -instance - Named instance name
	 -sqlsvc - SQL Server Service account
	 -sqlpwd - SQL Server Service password
	 -agtsvc - SQL Server Agent account
	 -agtpwd - SQL Server Agent password
.EXAMPLE
   .\Create-NodeIni.ps1 -server "FOO" -instance "BAR01" -sqlsvc "EMINENT\sqlsvc" -sqlpwd "password" -agtsvc "EMINENT\sqlasvc" -agtpwd "password"
#>

param([parameter(Mandatory=$true)][string] $server,
      [parameter(Mandatory=$true)][string] $instance,
			[parameter(Mandatory=$true)][string] $sqlsvc,
			[parameter(Mandatory=$true)][string] $sqlpwd,
			[parameter(Mandatory=$true)][string] $agtsvc,
			[parameter(Mandatory=$true)][string] $agtpwd)

$output="$server`_$instance`_addnode.ini"

(Get-Content "add_node_template.ini") | Foreach-Object{
	$_ -replace "<SRV>",$server `
		-replace "<INST>",$instance `
		-replace "<AGTACCT>",$agtsvc `
		-replace "<AGTPWD>",$agtpwd `
		-replace "<SQLACCT>",$sqlsvc `
		-replace "<SQLPWD>",$sqlpwd } | Set-Content $OUTPUT

While there’s a lot going on here, it’s the last seven lines where the work gets done. The Get-Content comandlet reads in the template file, then gets piped to the Foreach-Object commandlet to process each line of the file. This works because when you make a text file a string object in Powershell, it becomes a multi-value array where each line is an array member. Then the script executes several different replaces (wherever it finds matches for my stubs) on each member. The result is piped to the Set-Content commandlet, writing out the new post-replace array out to a new file.

It’s a simple trick, but with this I not only save myself a ton of time by generating files for SQL installs, I also make my installations consistent. Every node install is run with the same options and selections. Human error is reduced and increased tea time is provided. Future enhancements would be to update my templates so I can generate the initial SQL install .ini, but since I do so few instance installs, it’s not really worth the effort at this point. Plus, it looks like someone is already working on that.

Enjoy the rest of T-SQL Tuesday. Plenty of great people are contributing and the great thing about Powershell is there are so many ways it can be used. I hope that our scripts will not only get you excited about learning Powershell yourself, but give you some great ideas on how you can leverage it for your own success.

WHERE to JOIN?

I’ve been really enjoying the DBA StackExchange site recently.  Not only can you see what challenges and hurdles people have, the site construction gives people a great way to contribute to an ever expanding library of database solutions.  Questions range from the very simple to the highly esoteric, but in all cases the community comes together to groom both questions and answers in such a way that a comprehensive knowledge base is built for future use.

One of these questions that recently came up was:  Which performs better, creating your joins in the FROM clause or the WHERE clause?  Most people have been using the ANSI-92 syntax, so this question may seem a little odd, but I still see a lot of SQL code out there that uses the prior syntax where the joins are declared in the WHERE portion of your query.  If you want to read more, Mike Walsh(@Mike_Walsh) has a great post on how the syntax has evolved and how its changed in SQL Server 2012.

Back to the question, though.  Does it really make any difference?  Well, I could tell you straight out, but what sort of blog post would that make?  Instead, let’s test it out ourselves.  Using a basic schema, I’ve put together two very basic queries:

SELECT
  s_id
  ,s_desc
  ,b_desc
  ,f_desc
FROM
  snafu s
  ,bar b
  ,foo f
WHERE
  s.b_id = b.b_id
  AND b.f_id = f.f_id;

SELECT
  s_id
  ,s_desc
  ,b_desc
  ,f_desc
FROM
  snafu s
  INNER JOIN bar b ON (s.b_id = b.b_id)
  INNER JOIN foo f ON (b.f_id = f.f_id);

As you can see, the only real difference here is that in the first query we have our joins in the WHERE clause. The second follows ANSI-92 syntax and places the joins in the FROM clause. Now how do we tell if they perform differently? Query plans, of course!

Query 1 (WHERE clause)

WHERE_JOIN

Query 2 (FROM clause)

FROM_JOIN

Notice how both queries have exactly the same plan.  This is because our friend, the Optimizer, understands the two approaches and will build the plan accordingly.  Want to play with it yourself?  You can check out the full example over at SQL Fiddle.

There are three things I’d like you to take with you after this brief exercise:

  • Functionally, it doesn’t matter if you declare your JOINs in your FROM or your WHERE clause, the optimizer will treat both as the same.  However, if you read Mike Walsh’s blog post, you really should be using the ANSI-92 standard.  The “old” syntax only works if you have your database in SQL 2000 compatibility mode (which means it doesn’t work at all in SQL 2012).
  • Query plans will answer most of your performance questions regarding SQL syntax.  If you haven’t been looking at them, I strongly suggest you pick up Grant’s book and start checking those plans out.
  • I’ve only recently discovered SQL Fiddle, but this is a great tool for mocking up and testing concepts for databases.  I haven’t built anything larger than 2-3 tables, but for basic test cases and examples to demonstrate something, it’s really cool(it even lets you look at query plans!).  Check it out.

Finding Orphan Logins

Cleaning up old security info is always something that’s hard to stay on top of.  Employees who leave or change responsibilities can make it difficult to track which logins need to remain in our instances.  The best way to manage this is by using Active Directory groups and managing access from the group level instead of by explicit login.  However, because of the communication and organization that’s required to set up these groups in the first place, it’s not uncommon to find ourselves managing explicit logins in our severs.

The unfortunate side effect is this typically results in orphaned logins within our SQL Servers.  A question on how to manage this came up recently on dba.stackexchange.com and, while I couldn’t find a way to answer the specific question (because it was focused around finding orphaned logins in SQL 2000), I was able to come up with a handy utility script for 2005+ that makes use of xp_logininfo.  It’s not a complex script and provides a list of the actual drop login statements you can execute as part of your cleanup.

declare @user sysname
declare @domain varchar(100)

--store total output
declare @return table
(login_name sysname
,sql_statement varchar(max)
,error_msg varchar(100))

--Store output of xp_logininfo
declare @junk table
(a_name sysname NULL
,a_type char(8) NULL
,priv char(9) NULL
,m_login sysname NULL
,p_path sysname NULL)

--Declare domain you are checking
set @domain = 'FOO'

declare recscan cursor for
select name from sys.server_principals
where type = 'U' and name like @domain+'%'

open recscan
fetch next from recscan into @user

while @@fetch_status = 0
begin
	begin try
		insert into @junk
		exec xp_logininfo @user
	end try
	begin catch
		--Error on sproc because login does not exist
		insert into @return
		values(@user,'drop login '+convert(varchar,@user),ERROR_MESSAGE())
	end catch

	fetch next from recscan into @user
end

close recscan
deallocate recscan

select * from @return

A couple notes on this script:

  • It currently doesn’t handle removing the logins from individual databases.  I have another script I use to do that and will be working on combining the two.
  • I don’t automatically execute the drop login because I’m paranoid.  :)  By providing a list, I can review the logins, along with the error message output, and verify that I want to remove them

Hopefully this little utility script can help you with login management in your environments.

Some Backup Info

So I’ve been pretty bad about blogging lately.  I don’t buy in much to excuses and so I have none to offer.  Just got away from me (something I’ll talk about shortly when I review my 2012 goals).  Anyway, let’s talk about something a little more useful to you, the reader.

A couple weeks ago, I gave a short presentation on MSDB’s BackupSet.  It was a fun little presentation and gave me a chance to share with the Boulder SQL user group one of my favorite tables.  Why one of my favorites?  Simply because there’s so much useful information packed in there, particularly when it comes to the critical job of monitoring your backups.  If you haven’t looked at it, I highly recommend that you check it out.

While developing the presentation, I put together a handy query that I wanted to share.  It’s very useful for not only showing you when your latest backups were, but also the size and location of those backups:

select
  bs.database_name
  ,bs.backup_finish_date
  ,bs.backup_size/1024.0/1024.0 [backup_size_mb]
  ,bs.compressed_backup_size/1024.0/1024.0 [compressed_size_mb]
  ,datediff(ss,backup_start_date,backup_finish_date) [backup_time]
  ,((bs.backup_size/1024.0/1024.0)/(datediff(ss,backup_start_date,backup_finish_date)+1)) [mb_per_second]
  ,1.0-(bs.compressed_backup_size*1.0/bs.backup_size) [compression_ratio]
  ,bm.physical_device_name
from
  msdb.dbo.backupset bs
  join msdb.dbo.backupmediafamily bm on (bs.media_set_id = bm.media_set_id)
  join (select database_name
          ,max(backup_finish_date) last_backup
        from msdb.dbo.backupset
        where type = 'D'
        group by database_name) lb on (bs.database_name = lb.database_name and bs.backup_finish_date = lb.last_backup)
where type = 'D'

T-SQL Tuesday #36 (#tsql2sday) – Coolness factor

“Come in, sit down, I want to show you something cool.”

This (paraphrased) is what Kevin Kline(@kekline) shared as part of the Summit 2012 Women In Technology panel.  While the session was intended to talk about how we can bring more women in to technology careers, it really struck me as a fantastic descriptor of the SQL Server community in general.  Many fields and groups invite people in, but there’s some “thing” about the SQL Server community that makes it the most open and supportive technology community I’ve ever had the pleasure to participate in, and I think it’s this desire to share that sits at the core of our success.

The Summit is built on this sharing mindset.  From the speakers to the board members to the participants, the people who put this on are volunteers.   Hundreds of professionals hope to be selected to speak at this event, even if just for a lightening talk, because they have something cool to share.  People hang out in the halls just to talk about what can help the next guy out.  What’s especially noticeable is how the community reaches out to the first timers.  I know some folks were a little adverse to wearing those ribbons, but if you had a first timer ribbon, alumni of the convention would reach out and make sure you were welcome.

It seems that typically, people hoard knowledge as a means of job security and a way to emphasize their superiority over others.  With SQL Server, it’s more that we want others to have the fun we’re having.  We work in tech because it’s fun and cool and share our knowledge so others can see the cool stuff we do.  Because when all is said and done, if we’re all having fun, we’re all going further.

So come in, sit down, and let me show you something cool.

(This month’s T-SQL Tuesday is brought to you by Chris Yates(@YatesSQL).

A Heap of Trouble

A couple weeks ago I came across an interesting flaw in SQL Server partitioning which was causing one of my tables to balloon in size.  Essentially, we found that this large table in one of our databases had a lot of empty pages in it, pages that usually got cleaned up by the SQL Server engine.  Unfortunately these pages were not getting removed and this 2 billion row table was about triple the size it should have been.

I won’t focus on the table to much, but there were two elements that created this problem: the table was a heap and the partition key was volatile (got updated frequently).  I’m not yet sure if it’s a case of forward pointers or what other mechanic within SQL Server is behind this, but the result is that when we start moving data in a heap across partitions by changing the partition key, pages the data occupied were left behind and empty in the old partition.

To test it, I built a basic heap table partitioned on an integer, then stuffed it with 1000 rows.  I then queried some of the DMVs to see what the layout of the data was:

Partition 2 has 1000 rows in it, occupying 150 pages, exactly as we would expect.  Average page use is ~83.74%, so fairly full.  All in all, not unsurprising.  What happens, though, if we change the partition key value on those 1000 rows and move it to partition 3?

The 150 pages that the data used in partition 2 are still there!  And we now also are using 143 pages in partition 3!  The result of all this is that even though we’re not duplicating data, we’re duplicating space.  Normally, SQL Server will detect these empty pages and remove them, but in this case they are not identified as empty.

What would happen if, instead of updating our data, what happens if we simply delete the rows?

Even though we removed all our data from partition 2, those pages are still out there occupying space(note the avg_page_space_used_in_percent field).  Obviously, this can quickly spiral out of control and become a major issue.

I did go through all these tests with a clustered index on the table and the results were much more positive.  When I moved data from one partition to the next or deleted rows, the pages the data occupied were removed as expected, keeping the total size of the table down.

There are a couple lessons learned from this:

  1. Heaps are trouble!  Based on the evidence we have here, it’s fairly evident that using a heap table causes all sorts of problems with partitioning.  There are reasons the best practice of using clustered indexes in SQL server exists, this being one of them.
  2. Do not use a volatile partition key!  Keep in mind that when we partition, we’re managing the physical structure under our table.  In general, we want to avoid moving data around on our disk.  If we choose a partition key that changes a lot, we’re creating a situation where our data moves around unnecessarily, which can cause us other problems even with a clustered index in place.
  3. Design is vital!  The problems we had were created initially many years ago.  At the time, things looked fine and there weren’t any noticeable issues.  It wasn’t until much later, when the table had grown to several billion rows, that we had problems.  You need to take care in designing out your structures, because fixing very large tables is weeks(if not months) of coordinated work and can require significant downtime to implement.

So there it is.  I’m not sure if this is intended behavior or just an oversight.  Most of the folks I talked to regarding this had not encountered this before, which is not unsurprising.  The original table design violates several best practices.  However, I do want to make people aware of this and I plan on filing a Connect item regarding this once I’ve had a chance to bounce it off a few more SQL brains.

For those interested, the scripts to reproduce my testing are:

T-SQL Tuesday #33 Wrap-up (#tsql2sday)

Last week I had the opportunity to host this month’s SQL blog party, T-SQL Tuesday.  I hope folks enjoyed the topic (it seemed like it), because the idea of experimenting with something is very much a core part of the learning process.  SQL Server is so deep and has a number of nooks and crannies.  To become proficient at it (let alone an “expert”), we have to keep pushing our boundaries and asking “What happens when….?”

I want to again thank everyone who participated.  Here’s a quick round up of the posts:

Rob Farley(b|t) peels back the cover on recursive Common Table Expressions (CTEs).  If you haven’t worked with them yet, CTEs are an incredibly useful tool in the SQL arsenal.  Using them recursively is just one example of how powerful they can be to a developer.

Jeffrey Verheul (b|t) chimes in with a big picture discussion on using cursors versus set based operations.  While we database folks prefer set based operations, sometimes it’s to our advantage to use a cursor.

Jim McLeod (b|t) jumps in with a great little way to use SQL Profiler to gain some insight into the workings of SQL Server by tracing Management Studio.  Not a lot of people are aware that almost everything that happens in Management Studio is a query that can be scripted out, and profiler lets us see that.

Oliver Asmus (b|t) shares some of his knowledge gained from putting together preventative maintenance tasks (a vital yet often overlooked part of being a DBA).

Aaron Bertrand (b|t) provides a cool trick to swap tables around using schemas.  A lot of SQL DBAs overlook the power of schemas in databases, but Aaron’s trick really gives as an efficient way to handle offline updates and other resource intensive tasks.

Nigel Sammy (b|t) pokes at the Transaction log with DBCC SQLPERF.  It’s important for DBAs to really understand their transaction log usage, and Nigel’s trick can give a DBA some real insight.

Tracy McKibben (b|t) gives us a trick using one of my favorite system views, sys.partitions.  There are times I lament not being able to see how long a query or process will take (I’d LOVE to have something like Oracle’s V$SESSION_LONGOPS view), but Tracy shows us how to use sys.partitions for insight into index rebuilds.

Chris Shaw (b|t) gives us some information on cluster failovers. While we want our operations to appear seamless to our users, as DBAs it’s important that we’re always aware of where our instances are living within our environment.

Rob Volk (b|t) really gets down the rabbit hole, using one of the Sysinternals tools to crack open SQL Server.  I’ve seen some folks do some really cool and powerful stuff with Sysinternals, so Rob’s post is a great way to introduce people to this awesome tool suite.  (Plus, how many guys even get to use the word “hekaton”?!?)

Jennifer Salvo (b|t) adds a little business intelligence to our topic, giving us a cool SSAS trick to manage currency conversions.  Businesses will ALWAYS care about their money, so having a good way to manage multiple currencies is a must.

Finally, first timer Jose Rivera (b|t) brings us full circle with his own post on using recursive CTEs to solve a tricky problem for drug costs.

(Oh, and look at this post if you want to see my contribution.  :D  )

I really appreciate everyone’s participation.  This is such a great part of the SQL Family and I hope that if you’re reading this you’ll consider participating and, perhaps, hosting a T-SQL Tuesday in the future.

T-SQL #33 – What happens if….? (#tsql2sday)

Sometimes a trick shot is the result of asking “What happens?” What happens if I draw off this ball, can I sink this other ball? What happens if I hit these two balls that are touching each other? The premise is that you look at a situation and explore the result.

For my SQL “trick”, I asked “what happens if you update a value to the same value?” It’s a question that gets asked by a lot of folks, but I wanted to explore the tools in SQL Server to answer that question for me. We can read about someone else figuring this out, but I think my best learning is done when I puzzle out a problem on my own.

I created two simple scenarios: an update of a column to itself and an update of a column to itself plus one. Fairly simple stuff:

create table test
(cola int not null identity (1,1),
colb int not null default 0,
colc varchar(100),
constraint pk_test primary key clustered (cola))

insert into test(colc)
select top 1000 a.name
from sys.objects a,sys.objects b

--Reset, write all pages out from buffers
checkpoint
dbcc dropcleanbuffers

--These tests need to be run separately!

--Test 1
update test
set colb = colb

--Test 2
update test
set colb = colb+1

(Note, the checkpoint and dbcc dropcleanbuffers is to make sure I’ve cleaned out all my log activity before I run either test.)

My first actual stop was to use DBCC PAGE to see if I could see if looking at the raw page would show if the pages had been modified. Unfortunately, this information wasn’t there, though I did get a lot of exposure to reading the output of the command. Not sure where to go next, I reached out to the Twitterverse on #sqlhelp. I was directed towards two items:

  • sys.fn_dblog – Undocumented function that shows all the individual actions in your transaction log.
  • sys.dm_os_buffer_descriptors – DMV that lists information on all the pages in your buffer pool

Querying both of these tools gave me the answer to “What happens?”, but in two different ways. When I looked at sys.fn_dblog, I could see no actual transactions when I updated the column to itself, but a transaction for each update when the value was actually modified:

--check for log activity
select [Current LSN],Operation,Context,[Transaction ID]
from sys.fn_dblog(null,null)

Sys.dm_os_buffer_descriptors has a field labeled ‘is_modified’ which is set to 1 if the page has been changed. When I queried for pages with is_modified equal to 1, I saw further evidence of what fn_dblog showed me:

--List pages in the pool that are modified
select * from sys.dm_os_buffer_descriptors
where is_modified = 1 and database_id = db_id('junk')

In both cases, I was able to see clearly what was going on under the covers and that SQL Server is actually intelligent enough to save some work when no work is actually being done.

Of course, the funny thing about trick shots is someone usually can do it better. Shortly after I figured this all out on my own, I found this fantastic blog post by Paul White (b|t) showing all this with a little more detail. However, what’s important is what you can learn by doing things yourself. It may take you time to reinvent the wheel (something you might not be able to afford), but when you do it yourself you learn a lot about how a wheel works.

Thanks for joining me this month, where I not only get to contribute but also host T-SQL Tuesday. I’m excited to see what other things our peers can teach us about the things they’ve learned by playing around with SQL Server, trying to see “What happens?”