SQL 2008 database in recovery for 22+ hours!!!?

Question

Post reply

SQL 2008 database in recovery for 22+ hours!!!?

Philip Millwood-419646

SSC Eights!

Points: 873
More actions
October 19, 2009 at 7:26 am

#216369

I applied updates to a production SQL server yesterday at 10:30AM, it is now 9:17AM and one of my databases is still in recovery. It isn't the size, as other databases just as large exist on the server.

The only thing I can think may be a problem is that when I initially upgraded this DB to 2008, one of the 3 data files was not connected. Once we realized this, it was connected and everything seemed okay. I have rebooted or restarted 3-5 times since then, and recovery always has taken a couple hours-- never this long.

We are running a restore to an alternate database name, but due to the size-- it may take just as long to restore as the recovery.

I've got nearly 200 people who are unable to work-- any suggestions?

Viewing 15 posts - 1 through 15 (of 36 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 1

Have a look in the SQL error log, look for messages relating to that database.

If you have entries like this, then the DB is in restart-recovery and you'll have to wait. If you don't have messages like this, post what you do have and we'll take it from there.

Recovery of database 'SomeDatabase' (5) is 70% complete (approximately 1508 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 2

Or try another reboot. I've seen things get stuck in prior versions, no real reason. A reboot fixed it.

If you do want to reboot, I might run sqldiag and capture some dump information.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 3

Check error log before you try a reboot. If there's something wrong with the database that a reboot won't fix, then rebooting isn't going to help. Check what's wrong first, then decide how to solve it.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Philip Millwood-419646 SSC Eights! Points: 873 More actions · Answer 4

Philip Millwood-419646

SSC Eights!

Points: 873

October 19, 2009 at 8:48 am

#1067683

Recovery is progressing-- albeit slowly. It is now at 50%

I have begun a restore of the backup from Saturday along with its log files-- which is already at 40%.

I may be dropping the production DB and renaming the restored DB so that I'm up and running again.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 5

Philip Millwood-419646 (10/19/2009)

Recovery is progressing-- albeit slowly. It is now at 50%

Ok, so you are seeing the "recovery is 50% complete..." messages?

Last time I saw a recovery taking this long (13 hours on a 1.2 TB database), the root cause was limited bandwidth to the SAN. One of the fibre switches that was supposed to be dedicated wasn't. Check with the server admins/storage admins (whoever's responsible for the SAN) and get them to check for anything that could be hindering performance

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Philip Millwood-419646 SSC Eights! Points: 873 More actions · Answer 6

Philip Millwood-419646

SSC Eights!

Points: 873

October 19, 2009 at 9:14 am

#1067708

While this is on a SAN, I'm not seeing performance issues there. No significant disk queues, only one core appears to be doing much of anything (out of 12), ~50MB/sec IO, memory isn't into swap...

The restore is using more IO than the recovery is, and it is at over 40% now.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 7

Avg disc sec/read?

Avg disk sec/write?

Queue length isn't a great measure with a SAN. Is 500 MB/sec a good throughput for your SAN?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Philip Millwood-419646 SSC Eights! Points: 873 More actions · Answer 8

Philip Millwood-419646

SSC Eights!

Points: 873

October 19, 2009 at 10:49 am

#1067759

~3.5 MB/min reads from the log file, no appreciable reads or writes to the data files.

Restore (to another attached drive) is running ~6GB/min, on another channel.

There doesn't seem to be an IO bottleneck.

Philip Millwood-419646 SSC Eights! Points: 873 More actions · Answer 9

Philip Millwood-419646

SSC Eights!

Points: 873

October 19, 2009 at 12:50 pm

#1067805

I'm up to 52% now. It was at 47% at 7:00am. I'm expecting it will be quite a few more hours now.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 10

How big is your log file? Do you perhaps know how much of it was active before the restart?

For that matter, how big's the data file?

Have you changed recovery interval from the default?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Philip Millwood-419646 SSC Eights! Points: 873 More actions · Answer 11

Philip Millwood-419646

SSC Eights!

Points: 873

October 19, 2009 at 4:15 pm

#1067859

Log file is 244,800,000 KB

A transaction log backup ran at 6AM before the reboot at 10:30. It was 17,520,339GB.

Recovery interval is set to default.

Philip Millwood-419646 SSC Eights! Points: 873 More actions · Answer 12

Philip Millwood-419646

SSC Eights!

Points: 873

October 19, 2009 at 4:34 pm

#1067864

older copy of the same DB restored to a test server shows over 1200 vlf. Autogrow is set to 5000MB

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 13

Steve Jones - SSC Editor

SSC Guru

Points: 728159

October 19, 2009 at 4:59 pm

#1067867

That seems like a lot of VLFs, but I'd like to know what Gail and others think.

Are you planning on letting that run to finish? (the restore)

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Philip Millwood-419646 SSC Eights! Points: 873 More actions · Answer 14

I will be letting it finish. I need to get an idea of what is wrong so that it can be prevented in the future.

A restore on another set of disks is nearly complete. Once it has finished, I'm detatching those disks and re-attaching them to another server (currently a QA/test box), and changing all app connection strings to point to that other server.

Once we have determined the database/server is safe, I'll need to move everything back.

Worse news-- this DB has replication running on it, so I need to reset replication twice more, at about 12 hours each time I've done it (with filtering out a bunch to reduce snapshot size and space even).

By the same token, I'm guessing that replication is part of why the log file recovery is taking so long based on a couple things I've found.

Yay.