RE: Comparing two tables – SQLServerCentral

SSChasing Mays

Points: 625

August 28, 2003 at 7:32 am

To use a checksum to determine row changes you need to have a unique key so that individual rows can be identified. (Checksum does not work on text, ntext, image, timestamp or ROWGUID data types.)

One manner in which I have used checksum is to create a checksum table for each table I wish to check for changes. The table contains a column for each field of the unique key and a column for the calculated checksum. For instance, a checksum table for an employee may look like:

CREATE TABLE EmployeeCheckSum(

EmployeeNumber CHAR(10) NOT NULL,

EmployeeCheckSum INT NOT NULL)

ALTER TABLE EmployeeCheckSum ADD CONSTRAINT PK_EmployeeCheckSum PRIMARY KEY (EmployeeNumber)

I then make an initial load of the checksum table.

SELECT EmployeeNumber, EmployeeCheckSum=checksum(*)

INTO #TempEmployeeCheckSum

FROM Employee

INSERT EmployeeCheckSum

SELECT * FROM #TempEmployeeCheckSum

Finding rows that exist in one table but not the other is a simple query I will not cover here. Finding changed rows is simply a matter of creating a new checksum table and comparing the old checksums with the new ones. For example:

CREATE TABLE EmployeeCheckSumNew(

EmployeeNumber CHAR(10) NOT NULL,

EmployeeCheckSum INT NOT NULL)

ALTER TABLE EmployeeCheckSumNew ADD CONSTRAINT PK_EmployeeCheckSumNew PRIMARY KEY (EmployeeNumber)

SELECT EmployeeNumber, EmployeeCheckSum=checksum(*)

INTO #TempNewEmployeeCheckSum

FROM NewEmployee

INSERT EmployeeCheckSumNew

SELECT * FROM #TempNewEmployeeCheckSum

SELECT C1.EmployeeNumber

FROM EmployeeCheckSum C1, EmployeeCheckSum C2

WHERE C1.EmployeeNumber=C2.EmployeeNumber AND C1.EmployeeCheckSum<>C2.EmployeeCheckSum

For my purposes I only wish to determine if a row has changed and I don't need to know what changed, I just replace the old row with the new one.

The function which calculates checksum is very fast so for finding changes to very large tables where only a small portion of the rows get modified frequently the checksum method is a good option to have.

If you just wish to see if a table has had a change you can calculate a table level checksum. For example, the following will generate a table level checksum for the employee table:

SELECT CHECKSUM_AGG(CHECKSUM(*))

FROM Employee

There is also a BINARY_CHECKSUM function which can be used to check for case-sensitive values. Checksums can also be used to build a hash index on a table by adding a checksum column to a table and indexing the checksum column.

I believe SQL Server uses checksums internally for hash indexing and replication verification.