Art of the DBA Rotating Header Image

January, 2014:

Why I Work With SQL Server

Hot on the heels of my NoSQL posts, I wanted to add a counterpoint to the discussion.  After all, even though I see the value of non-relational technologies, I think it’s important not to lose sight of the value relational databases offer.  In the tech world, it’s too easy to chase those squirrels of new tech (though it’s also easy to get stuck in our old patterns as well).  It always helps to take a step back and see the forest for the trees so we can choose the right path for our enterprise.

It is an understood fact that the tech world gets pretty dogmatic:  Oracle vs. SQL Server, Windows vs. Linux, Java vs. C#, etc.  People will dig their heels in about their choices and why those choices are considered “right” when, at the end of the day, each platform is simply a different approach to various higher concepts.  I tend to view most of these debates as Ford vs. Chevrolet and the only real question to answer is what tool is best for the job.

And believe me when I say that I know that this isn’t a groundbreaking opinion, but it is mine.  :)

That being said, we all have good reasons for selecting the platforms we work with.  For relational databases, it’s fairly evident that my choice is SQL Server.  Before I get into that, let’s first talk about why I lean towards relational over non-relational.  Don’t get me wrong, non-relational is an effective tool, but it’s still a very young technology.  The platforms for it are still growing and maturing, where they still are missing a lot of the reliability we’ve come to expect from our relational platforms.

Couple that with the nature of relational databases:  Joins, keys, and constraints do more for us than simply organize data, they provide functionality to implement and control business logic.  Data integrity is extremely important for many applications and a proper database design will provide you with all the rules to keep your data clean and ordered.  Just as with choosing non-relational stores, it’s a matter of choosing the appropriate tool for the job.  Sometimes that job requires tight control over your data, something that you just can’t get in a NoSQL database.

As for SQL Server as my relational platform of choice, there’s a lot of reasons why I favor it over other platforms.  It isn’t just because it is worked I’ve worked with (for the record, I’ve put some serious time in Oracle as well).  There are really three main reasons why I will promote SQL Server as the database I think people should work with.

Maturity

Let’s face it, SQL Server has been around for a while and Microsoft has had a lot of time to refine it.  Over the past 15 year I’ve worked with it, I’ve seen the addition of lots of neat features that enhance the core RDBMS offering.  At the same time, SQL Server is still a solid relational database and gives users a solid, reliable platform for storing their data.  It’s not perfect and I’ll be the last person to tell you it is, but it certainly is on par with Oracle and PostgreSQL.

Adaptability

Microsoft has seen the writing on the wall.  Some of it is from their own hand, some of it is how the data world is evolving.  However, “the cloud”, in memory structures, and big data are ubiquitous in today’s tech landscape.  Looking at the recent version of SQL Server, it’s apparent that Microsoft is trying to mold the product to live in this new realm.  Consider Hekaton, the ability to span databases between Azure and on-premise, and improvements to columnstore (along with other updates). Microsoft is making investments to keep pace with the changes we’re seeing in the larger technology world and I appreciate the vision they have for the product.

Accessibility

This is the big one for me.  The other two basically tell me that, in going with SQL Server, I’m going to have an established RDBMS platform I can rely on along with Microsoft continuing to improve things to keep pace with other products.  What sets SQL Server apart is the fact that it’s so much easier to work with, for both new folks and seasoned professionals.

First, let’s look at the fact that it’s Windows.  Now while we all lament SQL Server’s default settings, the fact is that almost anyone with minimal experience can get their own SQL Server instance up and running in short order.  This means that the door is open a little wider for people wanting to get into the database world than those who don’t have supporting skill sets for Linux or hacking the registry.  SQL Server ships with wizards and graphical tools to get folks going.  Just make sure you talk to a professional before getting to far.  :)

And that’s the second thing to talk about.  Maybe I’m biased because I’ve been involved in the SQL Server community for so long, but I’m continually amazed by the amount of free material for training and best practices provided by this community, from blogs to Twitter to a user group likely nearby where you can ask questions of people using SQL Server in your area.  It’s so easy to get started with SQL Server.

Yeah, I know I sound like a fanboy at this point (squee!).  Just so we’re on the level, I am well aware of SQL Server’s flaws.  There’s a lot of things that aren’t perfect or were added a couple versions ago but were never finished up (*cough* Management Data Warehouse).  And let’s not get into what’s in Standard Edition versus Enterprise.  Trust me, I get it.  Even with that, though, I feel that SQL Server is the preferred offering at this point for companies that are looking for a solid relational platform. 

2013 in review, 2014 in anticipation

As is my custom, I want to blog about what I did last year and my goals for the upcoming year.  Since I have more “regular” stuff to blog about, I figured I’d sneak this in on a Friday just to get it up here.  :)  The primary reason I do this is that which gets measured, gets done (hat tip Buck Woody).  2013 was a pretty eventful year, but I didn’t quite get everything I had wanted to.  So let’s look back at…

2013

Speaking – Did pretty well here.  I met all my goals as well as actually speaking at the 2013 Summit.  I love presenting and now it’s a matter of building new presentations.

Blogging  – Not quite an epic fail, but this one was not a passing grade.  I basically blogged regularly through to about March, then fell back to my sporadic pace.  I didn’t even blog for about 3 months over the summer.  I’m very disappointed with myself and can do better here.

Certification – This was a success and was a lot easier than I had anticipated.  I sat down to take four of the five exams cold to evaluate where I stood and ended up passing them.  The one I had to study for (and I knew this going in) was the Data Warehousing exam (70-463).  Pretty happy here, but now it’s a matter of finding other options for certification.

I know a lot of folks have a dim view of certs.  I agree with the premise, that the exams and certifications (mostly) are no substitute for real world experience.  The fact that they can be crammed for removes some of the validity.  But at the same time, I think that they can be a valuable barometer for someone’s knowledge.  I fall in Glenn Berry’s camp here, that certifications are like extra credit.  If I see a cert on a resume, I won’t assume knowledge of the subject matter, but it at least shows me the candidate was willing to make an extra effort to improve themselves.

Secret Project – Ok, this more or less flopped.  Basically, I wanted to either write a book or build a pre-con for database monitoring because I feel there’s a real gap out there.  These are still on my slate of things I want to do, but I did not get that done in 2013.  Boo me.

And that was 2013.  I’d give myself a B for the year, mostly because while I whiffed on some of my goals, I also managed to go beyond on a few things.  In addition, I’ve made some career choices and moved things in that direction.  Which now brings me to….

2014

Several of the goals will be maintenance or the same.  I want to maintain on the presenting, so we’re looking at at least 4 SQL Saturdays and a Summit submission.  Blogging needs to be once a week.  This is a real stretch goal, because it’s harder than it looks, but I need to strive for it.  These two items are the base foundation my career grows from, so I need to keep that foundation healthy.

New stuff for 2014 will be:

  • I’m trying to move my career more towards being a data architect.  So for that goal, I will write more posts on general data concepts and higher level planning.  There will still be plenty of administrative stuff, but I need to broaden my blogging scope.

  • I also need to start reading more on cloud computing, modelling, and larger design concepts.  Right now I’m reading Domain Driven Design and my goal is to read 1 book a month on a related topic.

  • I will identify and start working towards a new certification path for design and architecture.  Not sure what this is, but I’ll have something identified by the end of March and start working on it the rest of the year.

2013 was a year of change in my life.  Not earth shattering change, not seismic shifts, but a definite redirection in my aims.  2014 will be solidifying my plan and starting down that path.  My biggest challenge will be sorting out the things that are new and uncomfortable from those that are the wrong direction.  The question I will continue to ask is “Does this move me in the direction I want to go?”.

 

Divide and Conquer

So far, I’ve talked about NoSQL’s use of key-value pairs for storage along with its partitioned nature.  These two concepts are useful, but you’ve probably noticed that it can make querying a lot more difficult, particularly for aggregating data.  That’s where the third and final concept that I want to discuss comes in to play, the process that non-relational data stores use to analyze all the data it has stored across its nodes: MapReduce.

At the core, MapReduce isn’t some wild concept.  The idea is to marshall all of our compute resources to analyze and aggregate data.  Simply put, there are two phases that a non-relational system will use to aggregate data:

  • Map – On each individual node, collect and aggregate the data that matches our predicate.  Pass this to a higher node.

  • Reduce – Aggregate on a “master” node the values passed from the individual nodes.  Once the work from all the individual nodes has been aggregated, we have our answer.

As with most of this NoSQL stuff, it is not rocket science.  It does, however, make more sense if we visualize it.  Let’s say, for example, we wanted a count of all our sales orders within a sales territory and we had a 3-node NoSQL cluster (platform is irrelevant).  A basic MapReduce tree would look something like this:

map-reducesimple

This isn’t all that much different than how SQL Server (or another RDBMS, for that matter) would process it.  If we take the same concept and apply to a basic query from AdventureWorks, we can see the SQL approach to it:

select count(1) from sales.SalesOrderHeader
where TerritoryID = 3

map-reduceqp

You can see from this basic query plan that the only real difference here is that the mapping piece happens on a single node (your instance) rather than multiple nodes.  But otherwise, the process of gathering all the data we need to calculate and then calculating our total aggregate is the same.  It’s the idea of scaling this out over multiple compute nodes that is the major difference.

And scaling is huge.  3-nodes is a piece of cake, but what if we had 10?  Or 30?  Or 100?  Again, the idea is that non-relational data stores can handle terabytes and petabytes of data and you can scale out horizontally to meet those needs.  When we do this, our MapReduce tree can scale to our needs, so we might get something like this:

map-reducefull

Notice the multiple reduce steps.  Just as we can divide and conquer horizontally and collect data across that scale, we can also also do vertical scaling and have multiple reduce phases to further scale the load.  It’s all part of the massively parallel processing model that NoSQL promotes.

There are disadvantages to this, but most of them boil down to the implementation.  Primarily it’s driven around the need in many NoSQL platforms to write your own MapReduce functions using APIs or embedded languages (like JavaScript or platform specific functions).  May platforms are working on SQL-like alternatives to ease this learning curve, but there’s still a lot of growth and maturity needed here and I’m not sure how standardized any of these options really are.

Of course, these MapReduce queries aren’t necessarily going to smoke an RDBMS in terms of performance.  It’s all a matter of design and how you apply the tool.  MapReduce processes are designed, primarily, for batch aggregation of massive amounts of data.  Quantities of data that may not necessarily make sense to store in an RDBMS structure.  As we always say in the database admin world, it depends.  The goal here is to understand how this query process works so we can properly apply it to our own implementations.

Wrapping Up

Hopefully this three-parter has been helpful and educational for you.  I’ve certainly learned a lot about non-relational databases over the past 6 months and my eyes have been opened to new ways to manage data.  That’s really what we’re all here for, anyway.  As the world has more and more data to manage, we as data professionals need to find ways to store it, manage it, and make it useful to those who need it.  NoSQL is just another tool in our toolbox, a hammer to go with our RDBMS screwdriver.  Both valuable, if we use them appropriately.

Before I move on, I did want to share some resources I’ve found helpful in getting my head around non-relational databases:

  • Seven Databases in Seven Weeks – I’m actually not done with this, but it’s a great book for getting your feet wet with several different types of databases.  Be warned, this is not for the faint of heart and there’s a LOT you’re going to have to figure out on your own, but it will guide you through some basic concepts and use cases.

  • Martin Fowler’s Introduction to NoSQL – This is a great hour long presentation that talks about a lot of what I’ve covered here from a different angle.  It’s got great information and explains things at a good level.

  • Ebay’s Tech Blog on Cassandra Data Modelling – Getting my head around how you model data in a Big Table structure was really, really hard.  This two parter will help you figure this out.

I also want to thank Buck Woody(@BuckWoody) for opening my eyes to the world of the cloud and big(ger) data.  It was partly from a conversation with him that I went down this path and the learning that has come out of it for me has been tremendous.