Art of the DBA Rotating Header Image

SQL Server

Why I Work With SQL Server

Hot on the heels of my NoSQL posts, I wanted to add a counterpoint to the discussion.  After all, even though I see the value of non-relational technologies, I think it’s important not to lose sight of the value relational databases offer.  In the tech world, it’s too easy to chase those squirrels of new tech (though it’s also easy to get stuck in our old patterns as well).  It always helps to take a step back and see the forest for the trees so we can choose the right path for our enterprise.

It is an understood fact that the tech world gets pretty dogmatic:  Oracle vs. SQL Server, Windows vs. Linux, Java vs. C#, etc.  People will dig their heels in about their choices and why those choices are considered “right” when, at the end of the day, each platform is simply a different approach to various higher concepts.  I tend to view most of these debates as Ford vs. Chevrolet and the only real question to answer is what tool is best for the job.

And believe me when I say that I know that this isn’t a groundbreaking opinion, but it is mine.  :)

That being said, we all have good reasons for selecting the platforms we work with.  For relational databases, it’s fairly evident that my choice is SQL Server.  Before I get into that, let’s first talk about why I lean towards relational over non-relational.  Don’t get me wrong, non-relational is an effective tool, but it’s still a very young technology.  The platforms for it are still growing and maturing, where they still are missing a lot of the reliability we’ve come to expect from our relational platforms.

Couple that with the nature of relational databases:  Joins, keys, and constraints do more for us than simply organize data, they provide functionality to implement and control business logic.  Data integrity is extremely important for many applications and a proper database design will provide you with all the rules to keep your data clean and ordered.  Just as with choosing non-relational stores, it’s a matter of choosing the appropriate tool for the job.  Sometimes that job requires tight control over your data, something that you just can’t get in a NoSQL database.

As for SQL Server as my relational platform of choice, there’s a lot of reasons why I favor it over other platforms.  It isn’t just because it is worked I’ve worked with (for the record, I’ve put some serious time in Oracle as well).  There are really three main reasons why I will promote SQL Server as the database I think people should work with.

Maturity

Let’s face it, SQL Server has been around for a while and Microsoft has had a lot of time to refine it.  Over the past 15 year I’ve worked with it, I’ve seen the addition of lots of neat features that enhance the core RDBMS offering.  At the same time, SQL Server is still a solid relational database and gives users a solid, reliable platform for storing their data.  It’s not perfect and I’ll be the last person to tell you it is, but it certainly is on par with Oracle and PostgreSQL.

Adaptability

Microsoft has seen the writing on the wall.  Some of it is from their own hand, some of it is how the data world is evolving.  However, “the cloud”, in memory structures, and big data are ubiquitous in today’s tech landscape.  Looking at the recent version of SQL Server, it’s apparent that Microsoft is trying to mold the product to live in this new realm.  Consider Hekaton, the ability to span databases between Azure and on-premise, and improvements to columnstore (along with other updates). Microsoft is making investments to keep pace with the changes we’re seeing in the larger technology world and I appreciate the vision they have for the product.

Accessibility

This is the big one for me.  The other two basically tell me that, in going with SQL Server, I’m going to have an established RDBMS platform I can rely on along with Microsoft continuing to improve things to keep pace with other products.  What sets SQL Server apart is the fact that it’s so much easier to work with, for both new folks and seasoned professionals.

First, let’s look at the fact that it’s Windows.  Now while we all lament SQL Server’s default settings, the fact is that almost anyone with minimal experience can get their own SQL Server instance up and running in short order.  This means that the door is open a little wider for people wanting to get into the database world than those who don’t have supporting skill sets for Linux or hacking the registry.  SQL Server ships with wizards and graphical tools to get folks going.  Just make sure you talk to a professional before getting to far.  :)

And that’s the second thing to talk about.  Maybe I’m biased because I’ve been involved in the SQL Server community for so long, but I’m continually amazed by the amount of free material for training and best practices provided by this community, from blogs to Twitter to a user group likely nearby where you can ask questions of people using SQL Server in your area.  It’s so easy to get started with SQL Server.

Yeah, I know I sound like a fanboy at this point (squee!).  Just so we’re on the level, I am well aware of SQL Server’s flaws.  There’s a lot of things that aren’t perfect or were added a couple versions ago but were never finished up (*cough* Management Data Warehouse).  And let’s not get into what’s in Standard Edition versus Enterprise.  Trust me, I get it.  Even with that, though, I feel that SQL Server is the preferred offering at this point for companies that are looking for a solid relational platform. 

Lines In The Sand

When working with databases, the question of performance always comes up.  Whether it’s a new build or an existing setup, we need some sort of radar gun to measure our instances to make sure they’re up to snuff.  The idea is simple:  We need some sort of benchmark that tells us, at least in a relative sense, how well our systems are performing.

Of course, the tools and approaches available to us seem as numerous as stars in the sky.  Want to measure storage performance?  Take a look at SQLIO or IOmeter.  Need a full end testing suite?  Paid tools like LoadRunner and Benchmark Factory can help you.  Anything else?  A quick Google search will provide plenty of options.  So where do you start?

For my part, I have become a big fan of HammerDB.  Discovered via Kendra Little(@Kendra_Little), HammerDB is a handy tool that allows you to run a TPC-C benchmark against your instances(and not just SQL Server!).  You can specify several different factors to customize your testing, yet the tool has an easy to use interface.  It’s also very well documented, with some good instruction on how to set up your own load tests.

While the HammerDB documentation and Kendra’s post are fairly comprehensive, I did want to share a little on how I setup and run my own HammerDB tests.  This is mostly because there’s a couple gotchas that aren’t covered (after all, the guy who wrote it is an Oracle dude, so I’ll cut him some slack).  First off, let’s talk about the initial schema build.  We can let HammerDB create everything from scratch, but because it uses all of SQL Server’s defaults for that, we get a poorly configured database.  When I am setting up a test (approximately 100 warehouses), I will specifically create an empty database with my files right-sized and in SIMPLE mode, and then use HammerDB to create the schema within that database:

create database tpcc
on primary (name=tpcc_data01, filename='c:\dbdata\tpcc_data01.mdf',size=10gb,maxsize=unlimited,filegrowth=1gb)
log on (name=tpcc_log, filename='c:\dbdata\tpcc_log.ldf',size=4gb,maxsize=unlimited,filegrowth=1gb);

alter database tpcc set recovery simple;

I use SIMPLE mode primarily so I don’t have to bother with log backups on the benchmark database.  Keep in mind this doesn’t actually affect how transactions are processed, only on how they’re removed from the log file.  By properly sizing the log file, we avoid messing around with too many VLFs, which could adversely affect our numbers.

At this point, I’ll kick off the schema build, usually with 10 concurrent users.  It still could take a while to build, so patience is key.  I have had issues where the tool sometimes flubs on finalizing the schema (when creating the indexes and stored procedures), so if you run into that you can find them in a SQL script here.

Once the schema is in place, it’s a simple matter to run the test.  We can run an ad hoc test easily enough, but for a solid benchmark we need use HammerDB’s autopilot functionality.  The autopilot will run sequential timed tests using different concurrent user threads to iteratively increase the load.

HammerDB_DriverScript

First thing to do is make sure, under the Driver Script options, we’ve selected “Timed Test Driver Script”.  Then I tweak a couple things here, mostly around the length of each test.  HammerDB defaults to a ramp up time of 2 minutes, followed by a 5 minute test.  Keep in mind that having a ramp up time is very important, because it allows the system to “warm the cache” and give you more consistent numbers.  That being said, I adjust these values, going with a ramp up of 1 minute and a 5 minute test.  Our schema is not tremendously big (10 GB), so this is usually enough.  It doesn’t hurt to have larger values if you want, but 1 and 5 suit my purposes.  Once this is all set, I load the script and I’m ready to move on to the autopilot settings.

HammerDB_AutopilotOptionsNext, I need to enable the autopilot.  I’ll leave the concurrent user batches at their defaults, as they give you a nice battery of threads for usage.  However, this is where the first gotcha is.  When setting the time for each test, this has to exceed the total of the ramp up time and the test time, plus some time for setting up and tearing down each test.  Since I go with settings that give me a total of 6 minutes for a single test run, I’ll set the test time in the autopilot settings to 8 minutes to give the application time to do all the other setup work it needs.  Having this buffer time doesn’t affect anything with the test itself, but if you cut it to short, the application will essentially discard your results to move on to the next test.  As a nicety, I like to check “Show Virtual User Output” so I can sanity check that everything is running ok during the test.  The info isn’t very informative (like watching the Matrix code), but you’ll at least see errors in the run.

For the tests themselves, I’ll actually run the full autopilot test batch three different times.  I should get results that are relatively the same, but I do want to measure for consistency.  Then I chart out the results, looking at three numbers:

  • Concurrent users (note this is the number in HammerDB minus 1, because HammerDB has one user acting as a controller)
  • Transactions Per Minute (TPM)
  • Transactions Per Minute Per User (TPM/Concurrent Users)

Naturally, we should see the TPM increase as the number of connections rise.  What I’m particularly looking for is to see the behavior curve for TPM/user, which should remain steady.

Lines_Sand_TPM_ChartIf we take a look at this sample from a run on my laptop, you can see what I mean.  It is, of course, not a “good” thing that the average TPM/user drops as connections increase, but this is expected as my poor little laptop isn’t made for enterprise OLTP.  We can at least see that my laptop could handle two concurrent users before falling over on its face.

Some final caveats to keep in mind.  The first is to remember that these numbers are relative.  There’s no “good” value for this, it’s all related to how your systems are performing.  The second caveat is that this is an unofficial TPC-C result.  Vendors and the TPC will only certify something as official under very controlled circumstances, mostly because it’s their reputation on the line.    Finally, these values shouldn’t stand in a vacuum, but measure them alongside other metrics (i.e. CPU, Page Life Expectancy, etc).
That being said, I find HammerDB an effective tool for giving me a general idea of system performance, using a standard that is application agnostic, thorough, and has a proven track record.

WHERE to JOIN?

I’ve been really enjoying the DBA StackExchange site recently.  Not only can you see what challenges and hurdles people have, the site construction gives people a great way to contribute to an ever expanding library of database solutions.  Questions range from the very simple to the highly esoteric, but in all cases the community comes together to groom both questions and answers in such a way that a comprehensive knowledge base is built for future use.

One of these questions that recently came up was:  Which performs better, creating your joins in the FROM clause or the WHERE clause?  Most people have been using the ANSI-92 syntax, so this question may seem a little odd, but I still see a lot of SQL code out there that uses the prior syntax where the joins are declared in the WHERE portion of your query.  If you want to read more, Mike Walsh(@Mike_Walsh) has a great post on how the syntax has evolved and how its changed in SQL Server 2012.

Back to the question, though.  Does it really make any difference?  Well, I could tell you straight out, but what sort of blog post would that make?  Instead, let’s test it out ourselves.  Using a basic schema, I’ve put together two very basic queries:

SELECT
  s_id
  ,s_desc
  ,b_desc
  ,f_desc
FROM
  snafu s
  ,bar b
  ,foo f
WHERE
  s.b_id = b.b_id
  AND b.f_id = f.f_id;

SELECT
  s_id
  ,s_desc
  ,b_desc
  ,f_desc
FROM
  snafu s
  INNER JOIN bar b ON (s.b_id = b.b_id)
  INNER JOIN foo f ON (b.f_id = f.f_id);

As you can see, the only real difference here is that in the first query we have our joins in the WHERE clause. The second follows ANSI-92 syntax and places the joins in the FROM clause. Now how do we tell if they perform differently? Query plans, of course!

Query 1 (WHERE clause)

WHERE_JOIN

Query 2 (FROM clause)

FROM_JOIN

Notice how both queries have exactly the same plan.  This is because our friend, the Optimizer, understands the two approaches and will build the plan accordingly.  Want to play with it yourself?  You can check out the full example over at SQL Fiddle.

There are three things I’d like you to take with you after this brief exercise:

  • Functionally, it doesn’t matter if you declare your JOINs in your FROM or your WHERE clause, the optimizer will treat both as the same.  However, if you read Mike Walsh’s blog post, you really should be using the ANSI-92 standard.  The “old” syntax only works if you have your database in SQL 2000 compatibility mode (which means it doesn’t work at all in SQL 2012).
  • Query plans will answer most of your performance questions regarding SQL syntax.  If you haven’t been looking at them, I strongly suggest you pick up Grant’s book and start checking those plans out.
  • I’ve only recently discovered SQL Fiddle, but this is a great tool for mocking up and testing concepts for databases.  I haven’t built anything larger than 2-3 tables, but for basic test cases and examples to demonstrate something, it’s really cool(it even lets you look at query plans!).  Check it out.

A Heap of Trouble

A couple weeks ago I came across an interesting flaw in SQL Server partitioning which was causing one of my tables to balloon in size.  Essentially, we found that this large table in one of our databases had a lot of empty pages in it, pages that usually got cleaned up by the SQL Server engine.  Unfortunately these pages were not getting removed and this 2 billion row table was about triple the size it should have been.

I won’t focus on the table to much, but there were two elements that created this problem: the table was a heap and the partition key was volatile (got updated frequently).  I’m not yet sure if it’s a case of forward pointers or what other mechanic within SQL Server is behind this, but the result is that when we start moving data in a heap across partitions by changing the partition key, pages the data occupied were left behind and empty in the old partition.

To test it, I built a basic heap table partitioned on an integer, then stuffed it with 1000 rows.  I then queried some of the DMVs to see what the layout of the data was:

Partition 2 has 1000 rows in it, occupying 150 pages, exactly as we would expect.  Average page use is ~83.74%, so fairly full.  All in all, not unsurprising.  What happens, though, if we change the partition key value on those 1000 rows and move it to partition 3?

The 150 pages that the data used in partition 2 are still there!  And we now also are using 143 pages in partition 3!  The result of all this is that even though we’re not duplicating data, we’re duplicating space.  Normally, SQL Server will detect these empty pages and remove them, but in this case they are not identified as empty.

What would happen if, instead of updating our data, what happens if we simply delete the rows?

Even though we removed all our data from partition 2, those pages are still out there occupying space(note the avg_page_space_used_in_percent field).  Obviously, this can quickly spiral out of control and become a major issue.

I did go through all these tests with a clustered index on the table and the results were much more positive.  When I moved data from one partition to the next or deleted rows, the pages the data occupied were removed as expected, keeping the total size of the table down.

There are a couple lessons learned from this:

  1. Heaps are trouble!  Based on the evidence we have here, it’s fairly evident that using a heap table causes all sorts of problems with partitioning.  There are reasons the best practice of using clustered indexes in SQL server exists, this being one of them.
  2. Do not use a volatile partition key!  Keep in mind that when we partition, we’re managing the physical structure under our table.  In general, we want to avoid moving data around on our disk.  If we choose a partition key that changes a lot, we’re creating a situation where our data moves around unnecessarily, which can cause us other problems even with a clustered index in place.
  3. Design is vital!  The problems we had were created initially many years ago.  At the time, things looked fine and there weren’t any noticeable issues.  It wasn’t until much later, when the table had grown to several billion rows, that we had problems.  You need to take care in designing out your structures, because fixing very large tables is weeks(if not months) of coordinated work and can require significant downtime to implement.

So there it is.  I’m not sure if this is intended behavior or just an oversight.  Most of the folks I talked to regarding this had not encountered this before, which is not unsurprising.  The original table design violates several best practices.  However, I do want to make people aware of this and I plan on filing a Connect item regarding this once I’ve had a chance to bounce it off a few more SQL brains.

For those interested, the scripts to reproduce my testing are:

A short ride on a PoSH machine

I want to talk about one line of code.  Ok, well, two lines of code, but the first line isn’t that important, much like the opening act of a rock concert.  Let’s just pretend that the first line, while necessary, just gets us warmed up for the main event.

My problem was this:  I was collecting a SQL Server inventory but couldn’t retrieve all the information I was looking for.  I had started with some of the elements of Allen White’s(b|tscript as well as what Collen Morrow(b|t) assembled, but I was still missing the TCP/IP address and SQL port.  In my shop we use static addresses and I wanted to have that information handy.  So, to the Google-mobile.

My searching led me to a Sean McCown(b|t) video that showed me how to grab the TCP/IP address, so I was halfway home.  However, I just couldn’t find a clear way to grab the port information.  I think I was stretching my fingers to start pulling my hair out when I bumbled into the ManagedComputer class in the SMO.  Down the rabbit hole I went, using Get-Member, and eventually came up with this:

$managedcomp = new-object ('Microsoft.SqlServer.Management.Smo.WMI.ManagedComputer') $sname
$port=$managedcomp.ServerInstances[$iname].ServerProtocols["Tcp"].IPAddresses["IPAll"].IPAddressProperties["TcpPort"].Value

Hopefully that doesn’t look to intimidating, but let’s break it down a bit.  The ManagedComputer class (in this case $managedcomp) represents the physical host for the machine.  Using this, we can then access the various properties of that.  In succession, we have:

  • ServerInstances – An array of all the named instances installed on the machine.  I’m passing a variable to this, which happens to be the name of the instance I want the port for.
  • ServerProtocols – An array of the different protocols, so in this case we want the TCP/IP protocols.
  • IPAddresses – Note, these are the different address configurations SQL is listening on, but it’s what appears in the SQL Server configuration tool.  Your array elements are named IPN or IPAll (if it applies to everything).  Since I’m looking for the port that is assigned for all my addresses, I’ll look at the IPAll object.
  • IPAddressProperties – Simple enough, the properties of the IPAddress object, and we want the TcpPort property specifically and its value.

How did I get here?  Making extensive use of Get-Member.  Much like Theseus in the Labyrinth, I used that to figure out where to go next each time I went to a deeper property level.

And this is what I hope people take away from this.  There’s so much information buried in the .Net objects that we can take advantage of, especially by using Powershell.  I’ll be the first to admit that there might be a better way to do this(please tell me if there is).  This shows, though, that by using the tools in front of me, I was able to expose the secrets of this particular set of objects and bend them to my will.

 

P.S.  For those interested, I’ll post the full server inventory script here in the near future.

Process of Elimination

I never expected to talk as much about partitioning as I have so far this year. The tally is currently at six, with at least two more in the works. It’s a hot topic due to partitioning still being a mystery to a lot of folks and I’m glad I’ve been able to pull back the curtain a little bit.  However, there’s still some things not understood very well.

One of these is partition elimination. Heck, up until I really started studying partitioning, I didn’t understand it very well either. Just like many others out there, I thought that once you partitioned a table, all your queries would suddenly become faster because the engine would only look at the needed partitions. While SQL Server tries to do this, the reality of when partition elimination actually occurs is not quite as cool.

To figure out how partition elimination works, we only have to look at our query plans. Let’s crack open the demo I use in my partitioning presentation and take a look. As a quick recap, I use a basic orders table that is partitioned on an incrementing integer primary key:

Now let’s run a simple SELECT to query data out and capture the query plan that results:

select * from orders
where order_id between 11000 and 15000

Nothing complex here and nothing complex in the query plan. You’ll see the expected clustered index seek. What’s key is to look at the properties (F4) of your operator to see how your partitions are being used:

These two property values give us all the answers when it comes to partition elimination. First, Partition Count tells us that the engine only retrieved a single partition for our query. Actual Partitions gives us the partitions (in this case, partition 3) that were actually accessed. Great! In this case, by querying order_id, we were able to take advantage of our partitioning.

What if we use a different field, will SQL still try to take advantage of the partitioning? It depends on whether or not the partitioning key is used in any part of the query. We’ll go ahead and add an index on order_date (which will be partitioned aligned) and query to get the total order amount for date range:

select sum(order_amount)
from orders
where order_date between '1/1/2012' and '2/1/2012'

Now, this plan is a little more interesting, so let’s take a look at it, focusing on the two places where we hit the table:

You’ll note that the query has to use a key lookup to complete the query. This is because we have an index on order_date, but it doesn’t include the order amount, so the engine has to go back to the table using the clustered index to find our data. Let’s now look at the properties for each of these operators:

Where are the partition properties for the index seek? Since the query can’t make use of partitioning for this part of the query, the properties aren’t listed. However, the key lookup (which makes use of the clustered index and, therefore, our partitioning scheme) does show that we accessed only partition number 3.

What if we go a step further and alter our index on order_date to include order_amount, removing the need to do a key lookup in the query? Let’s take a look:

By detailing that the query accessed eight partitions (1..8), we see that the engine tried to use the partitioning.  Since the partitioning key wasn’t directly involved, it ended up seeking across the entire index anyway.

To wrap it all up, let’s look at the performance of our order_amount queries:

  • Query cost on index without include: 2.63403
  • Query cost on index with include: 0.0276018

So even without using our partitioning, including the order_amount is still much faster than having to do a key lookup.

Overall, partition elimination can be a helpful tool in the engine’s toolbox. As your data sets get larger and larger, it can improve your performance, but only if you understand how the engine uses partitions. Sometimes, however, it’s better to take a different approach.  Don’t go and rewrite all your queries simply to get your partitioning key involved, because it may not help. Always do a careful analysis of what’s going on under the covers before making any tuning decisions.

Power(shell)ing your CMS

If you’re looking at Powershell scripts for SQL Server management, you’ll find a lot of them consume a list of servers in order to perform an action. For example, I put a script out a couple of weeks ago that can monitor the service state for your SQL Servers, emailing out a notice if services are down. This is handy, but maintaining a server list in a file is hardly elegant. For something more dynamic, many shops(including my own) will maintain a list of their servers in their own tracking database, giving you a resource you can query. However, there’s an easier way to handle this if you’re using Central Management Server.

The nice thing about Central Management Server is that, just like SQL Server instances, it can be accessed through the SMO with Powershell. This gives us a handy way to browse our sever listings as a folder structure. Just open up your Powershell window and (assuming you’ve loaded your SQLPS module/snapin), change your directory to:

CD SQLSERVER:\SQLRegistration\Central Management Server Group

Once you’re there, get a listing of the contents with your favorite command (dir/ls/gci) and you’ll see your registered Central Management Server repositories. You can browse from there into the appropriate repository and see a listing of all your registered instances. Now, keep in mind, you’ll still have to browse the directory structure if you’ve created folders to organize your servers into, but now we have access to them as if they were files in a folder.

Let’s take it a step further now and build out a server listing using some pretty basic commands within Powershell. We only have to keep in mind two gotchas:

  • The SMO recognizes both the folders and the instances as “containers”, so we can’t filter by PSIsContainer being false (the object property indicating container-ship).
  • The instance object has both Name and ServerName properties. The Name is the descriptive name, while ServerName is the specific name of the instance we can connect to.

Keeping this in mind, we can build a list by running the following command in the repository:

ls –recurse | where {$_.ServerName –ne $NULL}

Breaking this down, it’s a standard looking command for Powershell. By using “-recurse”, we tell Powershell to recursively browse through all the folders and subfolders in our directory for the lookup. Filtering on “$_.ServerName –ne $NULL” means we exclude the directories (since they don’t have a ServerName). Now we have an array of all of our servers that we can use for processing loops:

$srvs= ls –recurse | where {$_.ServerName –ne $NULL}

ForEach ($srv in $srvs)
{
  $inst=$srv.ServerName
  #do some stuff
}

Another handy element in that object is the ConnectionString property. This provides you with a ready made connection string to use if you need to connect to SQL Server using OLEDB or ODBC calls. While I haven’t had a need to use this yet, I can definitely see the benefits. There are also other properties that can be explored and used, depending on your scenario.

All of this makes for a great extension of Central Management Server, allowing DBAs to leverage the repository for more than just centralizing lookups in Management Studio. This can be a handy tool if you need something with a little less administrative overhead than a server inventory database or some sort of file listing. This means that the Central Management Server, which most of us see as something we only really use within Management Studio, can be leveraged to be so much more.

I need more power!

Man, sometimes blogging is hard. I meant to get this out earlier in the week, but better late than never, right?

When I was out at SQL Rally a couple weeks ago, I went to Argenis Fernandez(b|t) and Robert Smith’s(b|t) pre-con for SQL Server best practices. It was a good session, with lots of interesting discussion on ways to configure your environments for SQL. One gotcha we talked about was SQL 2008’s annoying power plans and how they get set to Balanced by default. Now, I get what Microsoft is trying to do with these plans, but defaulting to Balanced on servers? Really? Let’s use some common sense.

Anyway, before I get on a rant (don’t worry, I’ve got a couple of those in store for later blog posts), the question that popped into my mind when we started talking about this was how to do it on Server Core. Changing the plans on your standard Windows installs is easy, you just browse into Power Plans (click your start and type “power plans” in the search box). What about when we have no GUI? Do we panic? Of course not, because we have the command line!

Some quick Googling provided me with two items. One is powercfg, a command line utility that Microsoft has provided to allow us to edit these plans. It’s handy, but a little confusing, especially because the power plans use GUIDs for management and that adds a whole new level of trickiness. So, the second Google nugget to appear was this handy script from Aaron Saikovski, where he wraps powercfg in some Powershell to give us more graceful management. Here it is with some editing/tweaking from me for use:

function Set-PowerPlan(
    [ValidateSet("High performance", "Balanced", "Power Saver")] [string]$PreferredPlan)
{
	$currplan = powercfg -getactivescheme
	Write-Host "Current plan is $currplan.substring($currplan.Indexof('{')) [$(Get-Date)]"

    Write-Host "Setting Powerplan to $PreferredPlan"
    $guid = (Get-WmiObject -Class win32_powerplan -Namespace root\cimv2\power -Filter "ElementName='$PreferredPlan'").InstanceID.tostring()
    $regex = [regex]"{(.*?)}$"
    $newpowerVal = $regex.Match($guid).groups[1].value

    #Set the powerplan
    powercfg -S  $newpowerVal

	$currplan = powercfg -getactivescheme
	Write-Host "Current plan is $currplan.substring($currplan.Indexof('{')) [$(Get-Date)]"
}

#set Preferred powerplan
Set-PowerPlan $args[0]

Save the script and call it with whatever plan you want as the argument, or put it in your profile and call it as a function. Technically, you can use this to set the power plan to any of the default options, but my main use of it is to set the plan to High Performance for my Core servers.

UPDATE: Kendall Van Dyke (b|t) provides a handy way to update the plan using the WMI. I’ve not used it much, but I’ll definitely play around with it. In general, I think you could go either way for managing your powersettings.

Get-WmiObject -Class win32_powerplan -Namespace root\cimv2\power | ? { $_.ElementName -ieq 'high performance' } | % { $_.Activate() }

FILESTREAM on Server Core

A couple weeks ago, I did a post for T-SQL Tuesday on the Filetable, a cool new SQL 2012 feature. One thing I glossed over a little bit was enabling FILESTREAM for your SQL Server, where I just linked the MSDN article walking you through configuring the service using the Configuration Manager.  This is great as long as you have the GUI to use. However, I’ve said previously that I’m trying to do all my work (demos and blog posts) using SQL 2012 on Server core, which means no GUI. This makes it a leeeeeeeeeeeeeetle trickier and requires some more advanced techniques.

We need to flex our Powershell muscles for this. For me, this still means hitting the Google search to find the command syntax I need, though I understand it a little better. I found this excerpt from Wrox’s Microsoft SQL Server 2008 Administration with Windows PowerShell on Google Books.  The example shows, quite clearly, how to use Powershell to hit the WMI and the appropriate methods to enable FILESTREAM with the right perms and names.

I took the example there and reworked it slightly, mostly to change the namespace so that it will use the correct SQL 2012 interface.  It’s not much, when you get right down to it.  The core purpose is to create the WMI object and then use the EnableFilestream method (which may be hard to do if you don’t know the appropriate namespace to use):

instance="MSSQLSERVER"
$wmi=Get-WmiObject -namespace "root\Microsoft\SqlServer\ComputerManagement11" -class FILESTREAMSettings | where {$_.InstanceName -eq $instance}
$wmi.EnableFILESTREAM(3,$instance)

There are two parameters to the EnableFILESTREAM method. The first is access level, with 4 possible values that relate to how much access is granted to the filesystem. The easiest way to understand it is to relate the values to the settings in the GUI:

(Access level 0 is where no boxes are checked, essentially disabling FILESTREAM)

The second parameter is simply the share name, so in this example we would create\\<SERVER NAME>\MSSQLSERVER for our FILESTREAM objects.  Now, note that you still have to run the SQL to configure the SQL Server side of things (see my previous post).  This step just replaces activating the service properties from the Configuration Manager.

The other alternative to this method is to use the remote management tools to connect to the instance and alter the service settings through the GUI. I like this way better, though, because I have more control over it and I don’t need any additional connections or tools.

Edit:  I forgot to mention that this will require you to restart the SQL Service to apply (thanks Hayden Hancock for pointing this out in the comments).  Since we’re Powershelling it:

Get-Service $instance = “MSSQLSERVER” | Restart-Service

T-SQL Tuesday #29 2012 Party: Dancing on the FileTable (#tsql2sday)

For this month’s edition of T-SQL Tuesday, Nigel Sammy(b|t) asks us to talk a bit about a feature of SQL 2012 that we’re excited about. Now, while my most favoritist SQL 2012 feature is the ability to install on Server Core, I’ve already written about that and I want to talk about something that hasn’t gotten quite so much publicity. Enter the File Table, new functionality that exposes more of the Windows file system to SQL Server. With it, we get some additional tools for file handling that before we had to do with other applications and workarounds like xp_cmdshell.

A File Table is a special type of table in SQL 2012. It’s got a fixed set of columns and points at a directory managed by SQL’s FILESTREAM technology. If you do a SELECT from the File Table, you’ll get a listing of all the files within that directory. What’s cool is that “inserting” rows into this table is as copying files into the directory. Once a file is in there, we can use use SQL to do other operations, like copying files between two different File Tables, or cleaning up old files based on the file dates or archive flags.

Setting up a File Table doesn’t take much, but since it uses FILESTREAM it can be a little tricky. The first thing you need to do is actually enable FILESTREAM for your SQL Server instance. To do so, just follow the steps from MSDN. For the purposes of this, I’m going to set up my share name as my instance name, SQL2012. This means that to get to my FILESTREAM objects and my File Tables, I’m going to be using the UNC path of \\SHION\SQL2012 for my laptop.

Now that we’ve got FILESTREAM going, we can go ahead and create our File Table. The first step is to set our database options to allow non-transactional access and a default virtual directory for the FILESTREAM.

--Set database specific options for FILESTREAM
ALTER DATABASE demoFileStream
SET FILESTREAM (non_transacted_access=FULL,directory_name=N'FSDemo')

Next, we need to add a FILESTREAM filegroup to the database, which defines the physical directory where our file tables will exist. Note, this directory will be a physical directory, but SQL Server must create it so it cannot already exist.

--Filegroup for the filestream
ALTER DATABASE demoFileStream
ADD FILEGROUP fgFileStream CONTAINS FILESTREAM

--Directory for the filestream filegroup
ALTER DATABASE demoFileStream
ADD FILE (NAME=flFileStream,FILENAME=N'C:\FileStream')
TO FILEGROUP fgFileStream

Finally, we can now create our file table with a simple CREATE TABLE statement. Note, we won’t declare any columns as those are pre-defined for the file table structure.

CREATE TABLE FT_Test
AS FILETABLE

And we’re done! We can now browse to the following UNC filepath: \\SHION\sql2012\FSDemo\FT_Test. This directory is the physical location of our file table. For now, I’ll put a couple test files in there.

At this point, we’ll just run a simple select and, bam! There’s a listing of our files.

SELECT
  name,
  file_type,
  creation_time,
  last_write_time
FROM FT_TEST


Voila! A happy little file table, ready for our use. I’ll be honest, there’s some limited use here and having file tables doesn’t solve a lot of problems. It’s very useful, though, for certain use cases and can be a very powerful tool in your tool kit. If you’re trying to think of situations where you need it, consider times where you need to work with files on the file system:

  • Managing backup files: If you’re like me, you write your backups to a unique filename for each backup. This means I usually need some other tool to clean up my old files. With File Tables, I can manage that clean up completely within SQL.
  • Restores: If you’ve got a lot of transaction logs to restore and you want an easy way to parse through files to find the ones you need, you can now do that using File Tables.
  • Basic ETL: If you need a simple ETL solution, File Tables can give you some additional leverage so you can keep your entire solution within SQL.

Thanks again to Nigel Sammy for hosting this month’s T-SQL Tuesday. There’s so much new with SQL 2012, it’s good for Nigel to give the community a chance to share some of the new features that are now out. Enjoy the rest of T-SQL Tuesday #29!