Blog Post

Fix Page Checksum

,

A checksum value is tagged with every data page in SQL Server. This is used to detect issues on the storage system to ensure what has been read from the disk is what has been written to the disk. When the checksum is detected which is not the one it’s supposed to be, the page will not be read into the buffer pool. SQL Server will return error. If you are sure the data on the page are all correct or the checksum is broken purposely, you do have the option to fix it even you don’t know how to calculate it.

Now let’s create our experimental database to get this fun stuff done.

use master
drop database Test1
create database Test1
go
use Test1
go
create table t1(Field1 char(10))
go
insert into t1 values('AAAAAAAAAA')
select * from t1
/*
Field1
----------
AAAAAAAAAA
(1 row(s) affected)
*/

Now let’s break it

-- first find out where it is
use master
go
alter database Test1 set single_user with rollback immediate
go
dbcc traceon(3604)
go
dbcc ind(test1, t1,0) -- page 118 in my case
go
-- then, check the record
dbcc page (test1, 1, 118, 1)
/*
Slot 0, Offset 0x60, Length 17, DumpStyle BYTE
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 17
Memory Dump @0x0000000010F0A060
0000000000000000:   10000e00 41414141 41414141 41410100 00        ....AAAAAAAAAA...
*/
--Now let's modify one of the A to B
dbcc writepage(test1, 1, 118, 100, 1, 0x42, 1)
--Notice that the last parameter I used is 1 not the default value
--this will bypass the the buffer pool and direct write the data to the page on disk
--Buffer pool calculates the checksum before writting. If I bypass it
--it will mean the checksum is broken.
go
--Let's check the page again
dbcc page (test1, 1, 118, 1)
/*
Slot 0, Offset 0x60, Length 17, DumpStyle BYTE
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 17
Memory Dump @0x0000000009D7A060
0000000000000000:   10000e00 42414141 41414141 41410100 00        ....BAAAAAAAAA...
*/
go

Ok, now we successfully break the page. if you run DBCC CheckDB(Test1), you will see following errors

Msg 8939, Level 16, State 98, Line 1

Table error: Object ID 245575913, index ID 0, partition ID 72057594039042048, alloc unit ID 72057594043432960 (type In-row data), page (1:118). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 2057 and -4.

Msg 8928, Level 16, State 1, Line 1

Object ID 245575913, index ID 0, partition ID 72057594039042048, alloc unit ID 72057594043432960 (type In-row data): Page (1:118) could not be processed. See other errors for details.

It does not give you too much information. What if you run select * from Test1.dbo.t1. The error you you get is very mouthful

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x5f1b00a8; actual: 0x5f1a80a8). It occurred during a read of page (1:118) in database ID 5 at offset 0x000000000ec000 in file 'F:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\Test1.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

In this error message, there are very meaningful messages delivered – (expected: 0x5f1b00a8; actual: 0x5f1a80a8). One of those 2 numbers must be correct. Those 2 numbers are integer numbers – least significant digits will be saved at lower address. So the values of storage format of those numbers are a8001b5f and a8801a5f. Now let’s fire up DBCC PAGE to see whether we can find one of them. If so, the other one of those 2 will be correct checksum.

dbcc page (test1, 1, 118, 2)
/*
000000000FA3A000:   01010000 00820001 00000000 00000e00 00000000  .....?..............
000000000FA3A014:   00000100 54000000 8d1f7100 76000000 01000000  ....T.....q.v.......
000000000FA3A028:   21000000 52000000 18000000 00000000 00000000  !...R...............
000000000FA3A03C:   a8001b5f 00000000 00000000 00000000 00000000  ¨.._................
*/

Look at the last line above from DBCC PAGE result. At offset 3C, you can see A8001B5F – aha, that’s the incorrect checksum. Now let’s correct it with value a8801a5f

dbcc writepage(test1, 1, 118, 60, 4, 0xa8801a5f, 1)
go
dbcc checkdb(test1)
go 
select  * from test1..t1
/*
Field1
----------
BAAAAAAAAA
(1 row(s) affected)
*/

Now the error goes away and you can see the modified record. CheckSum is managed by buffer pool. DBCC WritePage with bufferpool(last parameter set to 0) can’t modify that value. You will have to use 1 for the last parameter under single database mode. Enjoy.

John Huang – SQL MCM & MVP, http://www.sqlnotes.info

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating