Art of the DBA Rotating Header Image

T-SQL Tuesday

T-SQL Tuesday #84: Getting Ready for your Presentation

boy-speech-lettersI’ve been super busy lately, but I wanted to at least post something for this month’s T-SQLTuesday. The topic is about encouraging new speakers, something I’m very passionate about. I think that speaking is one of the best things you can do to boost your career. If you are reading this and are considering speaking, I encourage you to reach out to your local user group and see if you can get started with a 15 or 30 minute session. Take a shot, I’ll bet you’ll be surprised.

What I want to share are some tips for the day you give your first presentation. A lot of folks are going to talk to you about building and preparing your presentation, but that is only half the battle. What you should do when you actually GIVE the presentation is often glossed over, even though this is the most high pressure moment of the whole cycle.

How do we reduce or relieve some of this pressure? Well, let’s start with a list of things that could possibly go wrong when you present. Think about the following list:

  • You’re presenting and you get an on-call page.
  • Your demo blows up spectacularly.
  • While giving your presentation, your computer attempts to apply updates.
  • You start 10 minutes late because you have issues with your video or sound.
  • During your presentation, someone sends you a picture on your favorite IM client:

oh_hai

Any of these will easily throw an experienced presenter off their game. For a new speaker, it can spell disaster. I’ve got a routine that I go through on the day of my presentation, which is designed to reduce that risk of disaster. And who doesn’t like reduced risk?

Getting Ready

Step 1: At the beginning of the day, well before my presentation, I make sure my presentation machine has been updated with Windows and other corporate software. This is SUPER important if it’s a Tuesday (when Microsoft releases updates). Doing this avoids any update surprises while I’m presenting or right before I go on stage.

Step 2: A couple of hours or so before my presentation, I will walk through my presentation. I open up the PowerPoint slide deck and step through it. When I get to demos, I will walk through my demo scripts. I test EVERYTHING, and do it in order. If I encounter an error, fix it, and then start over. This helps me insure that the flow works and that I understand what the step dependencies are in my demo.

Step 3: About an hour before my presentation, I will turn off everything on my presentation machine unnecessary to the presentation. Programs like Skype, Google, unneeded local SQL instances, Virtual Machines….so on and so forth. I only want what I need running to make sure that I have enough resources for my demos, along with keeping possible distractions shut down.

Step 4: At least 15 minutes before I’m due to present, I go to my room and hook up my presentation machine. I test the video and make sure my adapter works. This way I can address any tech issues that could hamper the presentation. I will display PowerPoint and also my scripts and demos to make sure everything looks ok.

I also usually duplicate my screen to the projector. This is important because if I extend, this means the only way (typically) that I can see what’s on my screen is to look back at it. This is distracting for your audience. If you duplicate, you only have to look down at your screen, which maintains contact with the audience.

Step 5: Right before I present, I turn my phone OFF. Then I put it in my bag. I get it away from me. I don’t want to get calls, I don’t want to have to worry about silencing it, and I don’t want it buzzing in my pocket if I’ve got a lot of notifications. The phone is off and away.

It’s GO time

At this point, I’m free and clear to do my presentation. Does that mean that nothing will go wrong? Of course not. However, performing these steps puts me in the best position to give my presentation without disruption. It is a foundation for success. Just like we build out our database solutions to minimize the option of failure, we need to approach our presentations with a similar sort of care to help guarantee our own success.

I want to thank Andy Yun(@sqlbek) for hosting this month’s T-SQL Tuesday. It’s a great topic and I hope folks get a lot out of it. If you’re considering stepping into the speaking world (and you should!), this month’s blog posts should give you all the tools to succeed. Good luck!

#TSQL2SDAY: SQL 2016 Direct Seeding

It’s been awhile since I’ve written a T-SQL Tuesday post, but thanks to Michael Swart(@MJSwart) for hosting this month’s edition. The request is simple enough: blog about some aspect of SQL Server 2016. This is actually a good opportunity for me to write about a feature I’ve been meaning to get to for some time now and just haven’t had the chance.

As I tell many people, I’ve been working with SQL Server since 1999 (version 7.0) and I can’t remember being more excited about a SQL Server release. 2016 has SO many cool features, but more than that, they are features that many people who use SQL Server have been asking for a while now. It is hard for me to simply blog about one feature since there are so many. However, there has been an improvement to Availability Groups that has not gotten a lot of coverage and I think it is a game changer, especially for those of us who have to manage AGs on a day to day basis: direct seeding.

Setting Up Replicas

In SQL Server 2012 and 2014, creating an Availability Group could take a significant amount of work. One of the more tedious tasks is setting up your replica databases. This is because that you need to restore your database to your replica node in a state close enough to the primary to allow synchronization to happen. It can take several log backup restores to hit that magic window where you can join the database on the secondary node. Then, you get to do it again on the next replica!

Enter direct seeding in 2016. With this feature you no longer have to do any of the restores. You simply create your replicas with direct seeding enabled, then when you add a database to the AG, SQL Server will directly seed the database to your replica nodes. It’s surprisingly simple.

How do we enable this magic? It’s part of the AG replica creation. We still need the prerequisites, so we’ll start with the following:

  • Create a cluster
  • Enable AlwaysOn for the SQL Service
  • Create endpoints and grant permissions on those endpoints
  • The database we’re adding must be in FULL recovery mode

I, unsurprisingly, have a Powershell script that will set this all up for me. Then we can create the Availability Group with the direct seeding setting. Currently, this is only supported using T-SQL (go wizard or Powershell support), but the syntax is documented in MSDN. Let’s start by creating our Availability Group:

CREATE AVAILABILITY GROUP [ENTERPRISE]
FOR
REPLICA ON 'PICARD' WITH (
 ENDPOINT_URL = 'TCP://PICARD:5022', 
 AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
 FAILOVER_MODE = AUTOMATIC,
 SEEDING_MODE = AUTOMATIC
),
'RIKER' WITH (
 ENDPOINT_URL = 'TCP://RIKER:5022', 
 AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
 FAILOVER_MODE = AUTOMATIC,
 SEEDING_MODE = AUTOMATIC
)
LISTENER 'ENTERPRISE' (WITH IP (('10.10.10.101','255.255.255.0')))

ALTER AVAILABILITY GROUP [ENTERPRISE] GRANT CREATE ANY DATABASE

--Run these on the secondary node
ALTER AVAILABILITY GROUP [ENTERPRISE] JOIN
ALTER AVAILABILITY GROUP [ENTERPRISE] GRANT CREATE ANY DATABASE

 

There are two important lines to call out here:

  • SEEDING_MODE = Automatic
  • ALTER AVAILABILITY [ENTERPRISE] GRANT CREATE DATABASE

The SEEDING_MODE is what enables direct seeding, but we need to grant the AG permission to create databases on the node as well. Then we add the database to the AG. At this point, though, we only have node. It’s time to add a second node and watch the magic happen.

Hands Off

Once we’ve created the AG, we just add the database to it:

ALTER AVAILABILITY GROUP [ENTERPRISE]
ADD DATABASE [WideWorldImporters]

Once we’ve done this, SQL Server takes over and pushes the database out to the secondary replicas:

2016-06-14_10-27-36

That’s it! SQL Server handles all the replication for you. All is not sunshine and rainbows, however. There are still some gotchas to keep in mind:

  • You need to create the Availability Group with all its replicas at the start. You can not add replicas at a later point (I thought I had done this, but more recent testing was not working).
  • The AG must also be set up before you add any databases. You can add the database as part of the AG creation or after, but you have to have all your nodes configured before adding the database to allow direct seeding to work.
  • This is a data operation, so we “canna’ change tha’ laws a’ physics, Captain!” If you add a database to an AG with direct seeding, SQL Server still has to copy all the bits and bytes across the wire to the nodes. Which means nodes in other geographical locations could be a problem.

Overall, direct seeding is pretty cool, but still feels a little raw to me. Personally I’d like to be able to seed to a replica I add to the cluster if I desire, but that does not work right now. The lack of GUI and Powershell support (both cmdlets and SMO) is also disappointing. I can still use it, but I’d rather have a simpler way to do it.

For some additional reading, Erik Darling wrote about the exact same feature for T-SQL Tuesday as I did, so it gives you another view at it. Otherwise, that’s it for me on this T-SQL Tuesday. Thanks to Michael Swart for hosting this month. Make sure you check out the rest of the T-SQL Tuesday posts!

#TSQL2SDAY: You’re Gonna Cluster that GUID

370801205_efe0fa8d7d_zWelcome to the continuing blog party that is T-SQL Tuesday. This month’s host is Mickey Stuewe (@SQLMickey) with a challenge to blog about data modelling mistakes we’ve seen. Unfortunately, it’s an easy topic to blog about because proper data modelling often falls by the wayside in order to rapidly deliver applications. I’m not saying it should, but the fact of the matter is many developers are so focused on delivering functionality that they do not think about the long term implications of their design decisions.

One such mistake is the selection of Globally Unique Identifiers (GUIDs) for a data type. Developers love this data type for a lot of reasons, some of them very good. The fact that GUIDs are globally unique provides a way to keep data unique when migrating along with a nicely obfuscated surrogate key that can help protect user data. All in all, I can not really blame people for wanting to use this data type, but the problem is that SQL Server does not manage that data type very well behind the scenes.

The problems with GUIDs in SQL Server are fairly well documented. As a quick review for context, the issue is that since GUIDs are random values, it is hard to efficiently index them and these will rapidly fragment. This means slower inserts and more disk space taken up by the index. I created the following two tables (one with a GUID, one with an INT) and inserted 2000 rows into each:

create table GUIDTest(
    orgid uniqueidentifier default NEWID()
    ,orgname varchar(20)
    ,CONSTRAINT pk_GUIDTest PRIMARY KEY CLUSTERED (orgid));

create table NonGUIDTest(
    orgid int default NEXT VALUE FOR NonGuidSeq
    ,orgname varchar(20)
    ,CONSTRAINT pk_NonGUIDTest PRIMARY KEY CLUSTERED (orgid));

The results from sys.dm_db_index_physical stats tell the story from a fragmentation and size perspective:

11-7-2015 11-37-41 AM

So GUIDs are bad. With as much as has been written on this topic, it feels a bit like shooting fish in a barrell. However, GUIDs will find their way into databases because of their usefulness to the application layer. What I want to talk about is a common misconception around “solving” the GUID problem: clustering on a sequence ID but keeping the GUID in the table.

Let’s start by creating a new table to have a GUID and cluster on a sequence:

create table SeqGUIDTest(
    seqid int default NEXT VALUE FOR GuidSeq
    ,orgid uniqueidentifier default NEWID()
    ,orgname varchar(20)
    ,CONSTRAINT pk_SeqGUIDTest PRIMARY KEY CLUSTERED (seqid));

11-7-2015 11-48-28 AMAs expected, less fragmentation and size. This is good, right? It can be, but here’s the problem: the sequence is completely meaningless to our data and our queries will likely not use it (unless we build in additional surrogate abstraction to relate sequence to our GUID). Let’s compare query plans for our GUIDTest and SeqGuidTest tables where we query each for a specific orgid value:

11-7-2015 11-54-20 AM

The query where the GUID is a clustered index is far more efficient than the one against the table where we cluster on a sequence. This is because it can leverage the index, meaning we will get a seek instead of a scan. While clustering on a sequence field saves us on space and fragmentation, it ends up hurting us when trying to do data retrieval.

If we were tuning the query against SeqGuidTest, the next logical step for tuning would be to create a non-clustered index on orgid. This would improve the query, but in order to make it useful (and avoid key lookups), we would need to include all the columns of the table. With that, we have completely negated any benefit we got from clustering on a sequence column, because the situation is now:

  • A clustered index that is the table, thus using all that disk space.
  • A non-clustered index that also uses as much space as the table (it has all the columns in it as well).
  • The non-clustered index now has the same fragmentation problem we were trying to avoid on the clustered index.

So while trying to avoid a problem, we have made the problem worse.

There are two lessons here.  The first is the classic “it depends” case. There are no hard and fast rules to how we implement our databases. You need to be flexible in creating your design and understand the complete impact of your choices. Clustering on a GUID is not great, but in the above case it is far better than the alternative.

The second lesson is to understand your data and how it is going to be used. When we build tables and indexes, we need to be conscious of how our data will be queried and design appropriately. If we use abstractions like surrogate keys, then we need to make sure the design is built around that abstraction. In the above examples, we could cluster on a sequence key, but only if that sequence key has some sort of meaning in our larger database design. Building a better database is about understanding our relationships and appropriately modeling around them.

Thanks to Mickey for the great T-SQL Tuesday topic! Please check her blog for other great posts throughout the day.

Desired SQL Configuration with #Powershell – Part 2

Last week, as part of T-SQL Tuesday 68, I introduced you to two Powershell functions for validating and setting your SQL Server configurations. While useful, they depend on some sort of configuration source, which begs the question: How do we get this source? This post covers the third function that will help you create your configuration source, using an existing server’s values and allowing you to use them directly or modify them as you see fit.

Get-SQLConfiguration

The concept of Get-SQLConfiguration is simple: Get the sys.configuration values of a SQL Server instance and export them as a hash table that can be used by the other functions. The additional criterion to consider is that Test-SQLConfiguration and Set-SQLConfiguration both use the SMO properties to do this, so our configuration source must also use these names. The result is a function that uses the SMO to perform its export:

function Get-SQLConfiguration{
param([string]$InstanceName='localhost'
  ,[string[]] $Filter
  )

$smosrv = new-object ('Microsoft.SqlServer.Management.Smo.Server') $InstanceName
$output = @()
if($Filter){
  $configs = $smosrv.Configuration | Get-Member -MemberType Properties | Where-Object {$Filter.Contains($_.Name)}
}
else{
  $configs = $smosrv.Configuration | Get-Member -MemberType Properties | Where-Object {$_.Name -ne 'Properties'}
}

foreach($config in $configs){
  $output += New-Object PSObject -Property ([Ordered]@{'Name'=$config.Name;
  'DesiredValue'=$smosrv.Configuration.$($config.Name).RunValue})
}

return $output

}

The function itself is not complex, but I did want to add one other feature to it. A simple export might be overwhelming, considering the number of configuration properties within SQL Server. Most of these will actually be left at their defaults. To aid us, I also added a Filter parameter that accepts an array of SMO configuration property names and will only export those properties.

There are a couple patterns to use this function. The simplest is to just run it for one server and use the output to check another:

$configurations=Get-SQLConfiguration -InstanceName PICARD
Test-SQLConfiguration -InstanceName RIKER -Configs $configurations

This call will essentially compare and find the differences between the two instances. If we wanted to get more specific, we could use our Filter parameter to only compare one or two configurations:

$configurations=Get-SQLConfiguration -InstanceName PICARD -Filter @('FillFactor')
Test-SQLConfiguration -InstanceName RIKER -Configs $configurations

It is also possible to store and manage these configurations by using the Get-SQLConfiguration function. Because we’re working with hash tables, we have a variety of storage options. The easiest to get started is to use the built in Export-CSV function and save our configurations as a delimited file (I like pipe, but commas will do):

Get-SQLConfiguration -InstanceName PICARD | Export-Csv -Path .\PicardConfig.cfg -NoTypeInformation

With the text file that is generated, we can go in and edit our configs, removing what we want and editing values we need to update. The result can then be stored in source control for tracking and audit purposes. We could even go as far as to load the configurations into a database table and call them from a central administrative server. There are a lot of possibilities.

The Code

While I’ve published the functions in these two blog posts, there’s an easier way to get ahold of them. These are part of my SQLConfiguration module published on GitHub, which you can download and use (along with my other SQL configuration functions). Please keep in mind this module and the code contained within should still be considered a work in progress and the functions are provided as is. The usual disclaimers apply, so be careful using it and test it thoroughly before making regular use of it. Of course, if you do find any issues, I would love to know about them so I can review and update my code.

There are many additional items that can be added to this. The largest that comes to mind is dynamic configuration values, as these functions consider your configs to be static. Items like minimum memory, maximum memory, and maximum degree of parallelism depend on the hardware configuration and there are various ways to calculate these values. There are also configurations not stored in this part of SQL Server, such as default database directories and SQL Server agent settings. These functions have plenty of room for enhancement. They’re a good start, however, to giving you additional tools to manage your environment, helping to make your SQL Server deployments more consistent and reliable.

#TSQL2SDAY 68: Desired SQL Server Configuration with #Powershell