Stopped SQL during rollback, waited on recovery, and watched rollback again.
This is how I learned a fair amount about SQL Server and how it handles rollback. The team I was on was working on a database with a huge table. We’re talking billions of rows here huge. Unfortunately, due to design and the growth of this table, it was all in a single partition. And we were kind of stuck with it.
The biggest problem with this table was maintenance. Sure, we could update statistics, but we couldn’t rebuild the index. There simply wasn’t enough space on disk for the rebuild to happen. How did we find this out? Because the disk filled up in the middle of the job, causing it to fail and go in to rollback.
There we were, looking at a 14-hour rollback. While the database was still functional, this table and all related tasks were blocked. The smart thing to do was to wait it out and let SQL sort through the transactions. Still, it’s easy to get a little antsy looking at that long of a rollback. We decided that we could go ahead and bounce the SQL Service, causing SQL to go into recovery of all the committed work (i.e., BEFORE the reindex job died), but not process any of the uncommitted transactions. Sure, we’d have to wait on recovery, but that couldn’t be as long as the rollback, right?
Well, not really.
The instance went into recovery, as expected, and didn’t take quite as long as the 14 hour rollback, but it still took a while. After all, we were in full recovery mode and the transaction logs were still pretty sizable. However, once it got done, we saw that the table was still locked. It confused the heck out of us, because we didn’t see any processes locking the object when we ran sp_who2. That is, we didn’t see any user processes.
You guessed it. When we looked at the system processes, there was the locking processes. SQL Server was still dealing with the rollback, it was just doing it under the covers as part of the recovery process. That’s the other thing, this was all part of fast recovery, which is SQL Server allowing access to the database while it’s still finishing up it’s recovery processes after a crash. Because even while some of those transactions may not have been committed, SQL Server still needs to sort through the chain and resolve the transactions in the log. Including the one it was rolling back.
So Moral of the Story? Let SQL Server do it’s thing when handling transactions. Yeah, things may suck, but many times there’s not much you can do but wait it out. There’s a reason these mechanisms are in place, so let the engine work.
Sorry there’s no script today, but thanks for joining me for Meme Monday. It’s the brainchild of Tom Larock, the SQL Rockstar(blog|twitter), but I’m not tagging him. Yet. :) However, I am going to tag three other bloggers I know:
Stop by their blogs and (hopefully) see their contributions. Or just read their blogs and enjoy the knowledge!