SQL 2008 database in recovery for 22+ hours!!!?

  • I applied updates to a production SQL server yesterday at 10:30AM, it is now 9:17AM and one of my databases is still in recovery. It isn't the size, as other databases just as large exist on the server.

    The only thing I can think may be a problem is that when I initially upgraded this DB to 2008, one of the 3 data files was not connected. Once we realized this, it was connected and everything seemed okay. I have rebooted or restarted 3-5 times since then, and recovery always has taken a couple hours-- never this long.

    We are running a restore to an alternate database name, but due to the size-- it may take just as long to restore as the recovery.

    I've got nearly 200 people who are unable to work-- any suggestions?

  • Have a look in the SQL error log, look for messages relating to that database.

    If you have entries like this, then the DB is in restart-recovery and you'll have to wait. If you don't have messages like this, post what you do have and we'll take it from there.

    Recovery of database 'SomeDatabase' (5) is 70% complete (approximately 1508 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Or try another reboot. I've seen things get stuck in prior versions, no real reason. A reboot fixed it.

    If you do want to reboot, I might run sqldiag and capture some dump information.

  • Check error log before you try a reboot. If there's something wrong with the database that a reboot won't fix, then rebooting isn't going to help. Check what's wrong first, then decide how to solve it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Recovery is progressing-- albeit slowly. It is now at 50%

    I have begun a restore of the backup from Saturday along with its log files-- which is already at 40%.

    I may be dropping the production DB and renaming the restored DB so that I'm up and running again.

  • Philip Millwood-419646 (10/19/2009)


    Recovery is progressing-- albeit slowly. It is now at 50%

    Ok, so you are seeing the "recovery is 50% complete..." messages?

    Last time I saw a recovery taking this long (13 hours on a 1.2 TB database), the root cause was limited bandwidth to the SAN. One of the fibre switches that was supposed to be dedicated wasn't. Check with the server admins/storage admins (whoever's responsible for the SAN) and get them to check for anything that could be hindering performance

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • While this is on a SAN, I'm not seeing performance issues there. No significant disk queues, only one core appears to be doing much of anything (out of 12), ~50MB/sec IO, memory isn't into swap...

    The restore is using more IO than the recovery is, and it is at over 40% now.

  • Avg disc sec/read?

    Avg disk sec/write?

    Queue length isn't a great measure with a SAN. Is 500 MB/sec a good throughput for your SAN?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ~3.5 MB/min reads from the log file, no appreciable reads or writes to the data files.

    Restore (to another attached drive) is running ~6GB/min, on another channel.

    There doesn't seem to be an IO bottleneck.

  • I'm up to 52% now. It was at 47% at 7:00am. I'm expecting it will be quite a few more hours now.

  • How big is your log file? Do you perhaps know how much of it was active before the restart?

    For that matter, how big's the data file?

    Have you changed recovery interval from the default?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Log file is 244,800,000 KB

    A transaction log backup ran at 6AM before the reboot at 10:30. It was 17,520,339GB.

    Recovery interval is set to default.

  • older copy of the same DB restored to a test server shows over 1200 vlf. Autogrow is set to 5000MB

  • That seems like a lot of VLFs, but I'd like to know what Gail and others think.

    Are you planning on letting that run to finish? (the restore)

  • I will be letting it finish. I need to get an idea of what is wrong so that it can be prevented in the future.

    A restore on another set of disks is nearly complete. Once it has finished, I'm detatching those disks and re-attaching them to another server (currently a QA/test box), and changing all app connection strings to point to that other server.

    Once we have determined the database/server is safe, I'll need to move everything back.

    Worse news-- this DB has replication running on it, so I need to reset replication twice more, at about 12 hours each time I've done it (with filtering out a bunch to reduce snapshot size and space even).

    By the same token, I'm guessing that replication is part of why the log file recovery is taking so long based on a couple things I've found.

    Yay.

Viewing 15 posts - 1 through 15 (of 36 total)

You must be logged in to reply to this topic. Login to reply