DBCC won't even run

Question

Post reply

DBCC won't even run

SQL_EXPAT

SSCrazy

Points: 2708
More actions
January 15, 2009 at 3:58 am

#383737

hello all

I'n getting errors trying to access a database:

The operating system returned error 21(error not found) to SQL Server during a read at offset 0x00000000296000 in file 'i:\MSSQL\DATA\xxxxxxxx_Data.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

When I ran a DBCC CHECKDB with physical_only I get the same error again. I'm assuming I've got some bad sectors on this disk as we recently had a multiple disk failure on this array.

Any chance of recovering this? I doubt it.

thanks

Alan Cranfield

thanks

SQL_EXPAT

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 1

Google tells me that Operating System error 21 is "The device is not ready." It's saying that the file can't be accessed at all, not that a couple sectors are damaged

Check the server, make sure that all of the drives that the SQL files are on are accessible and present.

Do not detach that database. Do not restart SQL or reboot the server.

Do you have good backups of that database?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQL_EXPAT SSCrazy Points: 2708 More actions · Answer 2

Hi Gail

The I: drive is there and accesible and I can write a file to it. I can even change the name of the data and log files so it looks like SQL hasn't attached to the files:

xp_fixeddrives

GO

xp_cmdshell 'dir i:\mssql\data\'

drive MB free

----- -----------

C 7469

F 80001

G 1065908

I 1146876

output

------------------------------------

Volume in drive I is I_Drive

Volume Serial Number is 94F5-10F0

NULL

Directory of i:\mssql\data

NULL

15/01/2009 12:25 .

15/01/2009 12:25 ..

09/01/2009 23:30 1,572,864,000,000 EQMCLOGTower_Data.mdf

15/11/2008 12:00 524,288,000,000 EQMCLOGTower_Log.ldf

2 File(s) 2,097,152,000,000 bytes

2 Dir(s) 1,202,587,013,120 bytes free

NULL

(12 row(s) affected)

I was thinking of renaming the files, dropping the database and then atttempting to attach the files.

There is the following warning message in the disk controller for this drive. Could be related. I'm thinking that maybe the SQL engine came up before the drive was ready. We had to cold boot this server as well as all the drive shelves after our disk failures last week. This database is the only one that is on the I: drive.

cheers

Alan

thanks

SQL_EXPAT

SQL_EXPAT SSCrazy Points: 2708 More actions · Answer 3

oops forgot the put the array controller warning:

"The current array controller had valid data stored in its battery backed write cache the last time it was reset or was powered up. This indicates that the system may not have been shut down gracefully. The array controller has automatically written, or has attempted to write, this data to the drives. This message will continue to be displayed until the next reset or power-cycle of the array controller. "

tks

Alan

thanks

SQL_EXPAT

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 4

-- Cranfield (1/15/2009)

I was thinking of renaming the files, dropping the database and then atttempting to attach the files.

No! Absolutely not. If you do that, there's a good chance that the database will not come back. Do not rename the files, do not delete the database, do not detach the database.

I'm thinking that maybe the SQL engine came up before the drive was ready. We had to cold boot this server as well as all the drive shelves after our disk failures last week. This database is the only one that is on the I: drive.

It could be, but in that case I would expect the DB to be suspect. Is it?

Can you check the error log, see if there are any more messages relating to that database, especially from when the system started up.

Do you have a backup?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQL_EXPAT SSCrazy Points: 2708 More actions · Answer 5

as far as I can see from the SQL event log this database recovered successfully after the reboot after the failed disks:

Message

Recovery of database 'EQMCLOGRestore' (8) is 100% complete (approximately 0 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.

Message

2 transactions rolled back in database 'EQMCLOGRestore' (8). This is an informational message only. No user action is required.

Message

Recovery is writing a checkpoint in database 'EQMCLOGRestore' (8). This is an informational message only. No user action is required.

Message

Recovery is complete. This is an informational message only. No user action is required.

then an hour later I see loads of these:

Message

Error: 8966, Severity: 16, State: 2.

Message

Unable to read and latch page (1:169729024) with latch type SH. 21(error not found) failed.

A DBCC ran successfully after that:

Message

DBCC CHECKCATALOG (EQMCLOGRestore) WITH no_infomsgs executed by CONNECT\SQLservice found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 3 seconds.

Then the database 'disappeared' and any attempted access results on this message in QA and the Error Log:

Message

The operating system returned error 21(error not found) to SQL Server during a read at offset 0x000002ece7a000 in file 'i:\MSSQL\DATA\EQMCLOGTower_Data.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

The weird thing is I can see the DB in SSMS and also see some of the tables (but not all???). If I right click on the database and select properties then I get the above error again in a message box. Same if I try to connect to that database.

We dont have a backup of this database! But we can rebuild it and bulk insert all the data from flat files, which is a pain.

cheers

Alan

thanks

SQL_EXPAT

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 6

So the drive was accessible when SQL started, but something happened a little later. Can you check the windows event log around the same time as the latch errors started? See if there's anything to hint what happened to start the problems

Right now, any restart of SQL, the server or the database will probably result in the database going suspect, which is significantly harder to fix.

It shouldn't be the case, but can you check the permissions on the database files, make sure that the SQL service account has full rights on those files.

Why no backup?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQL_EXPAT SSCrazy Points: 2708 More actions · Answer 7

I can see these controller errors around the time we lost the database:

SCSI bus fault occurred on Storage Box box 0, , Port 1 of

Array Controller in slot 1.

This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

SCSI bus fault occurred on Storage Box box 0, , Port 0 of

Array Controller in slot 1.

This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

On the question of why we dont have backups you could say we have made the executive decision to not backup certain databases. Reasons usually range from a) we dont have the space on disk/tape or tape b) we don't have a long enough maintenance window c) we run LIVE/LIVE e) we can rebuild from script f) we forgot (most likely :D). I can assure we do have backups for our critical databases. We have been caught out on occasion. With 300 SQL servers we tend to have the odd database not backed up - but I'm working on it. In this case the database being down is not critical as its a database thats used to restore archived data for historical queries. The other databases on this box are critical/live databases.

Appreciate the replies

cheers

Alan

thanks

SQL_EXPAT

Paul Randal One Orange Chip Points: 29544 More actions · Answer 8

You've got corrupt pages in some of the critical system tables - this is why CHECKDB fails right away. These were most likely cause by the bad RAID controller you mentioned earlier. Without a backup there's nothing you can do except to create a new database and export as much data as you can into the new one.

Thanks

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 9

-- Cranfield (1/15/2009)

SCSI bus fault occurred on Storage Box box 0, , Port 1 of

Array Controller in slot 1.

This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

SCSI bus fault occurred on Storage Box box 0, , Port 0 of

Array Controller in slot 1.

This may result in a "downshift" in transfer rate for one or more hard drives on the bus.

I don't know enough about drives to know what those mean. Any server people there that you can bounce this off. Especially around how to resolve it.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQL_EXPAT SSCrazy Points: 2708 More actions · Answer 10

SQL_EXPAT

SSCrazy

Points: 2708

January 15, 2009 at 10:15 am

#927514

Thanks, Gail and Paul

I will prepare for the long and arduous rebuild of the database..

I think we've learned our lesson now with spanned volumes i.e. dont use them.

cheers

Alan Cranfield

thanks

SQL_EXPAT

abishekshroff SSCrazy Points: 2040 More actions · Answer 11

abishekshroff

SSCrazy

Points: 2040

January 21, 2009 at 8:38 am

#930407

Your system table is corrupt.

Restore from backup or export as much data as possible 🙁

SQL_EXPAT SSCrazy Points: 2708 More actions · Answer 12

SQL_EXPAT

SSCrazy

Points: 2708

January 21, 2009 at 9:00 am

#930435

actually, rebooting the server and disk shelves brought the database back.. Ha!

--cranfield

thanks

SQL_EXPAT

Paul Randal One Orange Chip Points: 29544 More actions · Answer 13

ok - but you've got something going wrong in the IO subsystem that you need to fix - so next time it happens the corruption isn't permanent and debilitating.

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

SQL_EXPAT SSCrazy Points: 2708 More actions · Answer 14

SQL_EXPAT

SSCrazy

Points: 2708

January 22, 2009 at 1:45 am

#930923

Agreed. We've now migrated the DB from that old DL740/6400 to a new DL580 G5 with P800 controllers.

cheers

Alan

thanks

SQL_EXPAT