DBCC won't even run

  • hello all

    I'n getting errors trying to access a database:

    The operating system returned error 21(error not found) to SQL Server during a read at offset 0x00000000296000 in file 'i:\MSSQL\DATA\xxxxxxxx_Data.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

    When I ran a DBCC CHECKDB with physical_only I get the same error again. I'm assuming I've got some bad sectors on this disk as we recently had a multiple disk failure on this array.

    Any chance of recovering this? I doubt it.

    thanks

    Alan Cranfield

    thanks

    SQL_EXPAT

  • Google tells me that Operating System error 21 is "The device is not ready." It's saying that the file can't be accessed at all, not that a couple sectors are damaged

    Check the server, make sure that all of the drives that the SQL files are on are accessible and present.

    Do not detach that database. Do not restart SQL or reboot the server.

    Do you have good backups of that database?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail

    The I: drive is there and accesible and I can write a file to it. I can even change the name of the data and log files so it looks like SQL hasn't attached to the files:

    xp_fixeddrives

    GO

    xp_cmdshell 'dir i:\mssql\data\'

    drive MB free

    ----- -----------

    C 7469

    F 80001

    G 1065908

    I 1146876

    output

    ------------------------------------

    Volume in drive I is I_Drive

    Volume Serial Number is 94F5-10F0

    NULL

    Directory of i:\mssql\data

    NULL

    15/01/2009 12:25 .

    15/01/2009 12:25 ..

    09/01/2009 23:30 1,572,864,000,000 EQMCLOGTower_Data.mdf

    15/11/2008 12:00 524,288,000,000 EQMCLOGTower_Log.ldf

    2 File(s) 2,097,152,000,000 bytes

    2 Dir(s) 1,202,587,013,120 bytes free

    NULL

    (12 row(s) affected)

    I was thinking of renaming the files, dropping the database and then atttempting to attach the files.

    There is the following warning message in the disk controller for this drive. Could be related. I'm thinking that maybe the SQL engine came up before the drive was ready. We had to cold boot this server as well as all the drive shelves after our disk failures last week. This database is the only one that is on the I: drive.

    cheers

    Alan

    thanks

    SQL_EXPAT

  • oops forgot the put the array controller warning:

    "The current array controller had valid data stored in its battery backed write cache the last time it was reset or was powered up. This indicates that the system may not have been shut down gracefully. The array controller has automatically written, or has attempted to write, this data to the drives. This message will continue to be displayed until the next reset or power-cycle of the array controller. "

    tks

    Alan

    thanks

    SQL_EXPAT

  • -- Cranfield (1/15/2009)


    I was thinking of renaming the files, dropping the database and then atttempting to attach the files.

    No! Absolutely not. If you do that, there's a good chance that the database will not come back. Do not rename the files, do not delete the database, do not detach the database.

    I'm thinking that maybe the SQL engine came up before the drive was ready. We had to cold boot this server as well as all the drive shelves after our disk failures last week. This database is the only one that is on the I: drive.

    It could be, but in that case I would expect the DB to be suspect. Is it?

    Can you check the error log, see if there are any more messages relating to that database, especially from when the system started up.

    Do you have a backup?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • as far as I can see from the SQL event log this database recovered successfully after the reboot after the failed disks:

    Message

    Recovery of database 'EQMCLOGRestore' (8) is 100% complete (approximately 0 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.

    Message

    2 transactions rolled back in database 'EQMCLOGRestore' (8). This is an informational message only. No user action is required.

    Message

    Recovery is writing a checkpoint in database 'EQMCLOGRestore' (8). This is an informational message only. No user action is required.

    Message

    Recovery is complete. This is an informational message only. No user action is required.

    then an hour later I see loads of these:

    Message

    Error: 8966, Severity: 16, State: 2.

    Message

    Unable to read and latch page (1:169729024) with latch type SH. 21(error not found) failed.

    A DBCC ran successfully after that:

    Message

    DBCC CHECKCATALOG (EQMCLOGRestore) WITH no_infomsgs executed by CONNECT\SQLservice found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 3 seconds.

    Then the database 'disappeared' and any attempted access results on this message in QA and the Error Log:

    Message

    The operating system returned error 21(error not found) to SQL Server during a read at offset 0x000002ece7a000 in file 'i:\MSSQL\DATA\EQMCLOGTower_Data.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

    The weird thing is I can see the DB in SSMS and also see some of the tables (but not all???). If I right click on the database and select properties then I get the above error again in a message box. Same if I try to connect to that database.

    We dont have a backup of this database! But we can rebuild it and bulk insert all the data from flat files, which is a pain.

    cheers

    Alan

    thanks

    SQL_EXPAT

  • So the drive was accessible when SQL started, but something happened a little later. Can you check the windows event log around the same time as the latch errors started? See if there's anything to hint what happened to start the problems

    Right now, any restart of SQL, the server or the database will probably result in the database going suspect, which is significantly harder to fix.

    It shouldn't be the case, but can you check the permissions on the database files, make sure that the SQL service account has full rights on those files.

    Why no backup?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I can see these controller errors around the time we lost the database:

    SCSI bus fault occurred on Storage Box box 0, , Port 1 of

    Array Controller in slot 1.

    This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

    SCSI bus fault occurred on Storage Box box 0, , Port 0 of

    Array Controller in slot 1.

    This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

    On the question of why we dont have backups you could say we have made the executive decision to not backup certain databases. Reasons usually range from a) we dont have the space on disk/tape or tape b) we don't have a long enough maintenance window c) we run LIVE/LIVE e) we can rebuild from script f) we forgot (most likely :D). I can assure we do have backups for our critical databases. We have been caught out on occasion. With 300 SQL servers we tend to have the odd database not backed up - but I'm working on it. In this case the database being down is not critical as its a database thats used to restore archived data for historical queries. The other databases on this box are critical/live databases.

    Appreciate the replies

    cheers

    Alan

    thanks

    SQL_EXPAT

  • You've got corrupt pages in some of the critical system tables - this is why CHECKDB fails right away. These were most likely cause by the bad RAID controller you mentioned earlier. Without a backup there's nothing you can do except to create a new database and export as much data as you can into the new one.

    Thanks

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • -- Cranfield (1/15/2009)


    SCSI bus fault occurred on Storage Box box 0, , Port 1 of

    Array Controller in slot 1.

    This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

    SCSI bus fault occurred on Storage Box box 0, , Port 0 of

    Array Controller in slot 1.

    This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

    I don't know enough about drives to know what those mean. Any server people there that you can bounce this off. Especially around how to resolve it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks, Gail and Paul

    I will prepare for the long and arduous rebuild of the database..

    I think we've learned our lesson now with spanned volumes i.e. dont use them.

    cheers

    Alan Cranfield

    thanks

    SQL_EXPAT

  • Your system table is corrupt.

    Restore from backup or export as much data as possible 🙁

  • actually, rebooting the server and disk shelves brought the database back.. Ha!

    --cranfield

    thanks

    SQL_EXPAT

  • ok - but you've got something going wrong in the IO subsystem that you need to fix - so next time it happens the corruption isn't permanent and debilitating.

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • Agreed. We've now migrated the DB from that old DL740/6400 to a new DL580 G5 with P800 controllers.

    cheers

    Alan

    thanks

    SQL_EXPAT

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply