Comparing two tables

Question

Post reply

Comparing two tables

v_krishnan

SSChasing Mays

Points: 616
More actions
August 26, 2003 at 10:04 am

#83057

Hi,

I have a case where I need to check whether two tables whose structure is same contain the same data or not? Is there any built-in mechanism like checksum, or last modified time, etc. which could help me know whether the two tables are same or not, without manually comparing each row and column?

Thanks,

Krishnan

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 1

No, you'd need to compare each column. Comparing PKs is easy, but that won't tell you if all data is the same.

Steve Jones

sjones@sqlservercentral.com

http://qa.sqlservercentral.com/columnists/sjones

http://www.dkranch.net

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Greg Larsen SSC-Insane Points: 20891 More actions · Answer 2

You could write yourself some code that does that, like:

create table a (a int primary key, b int)

create table b (a int primary key, b int)

insert into a values(1,1)

insert into b values (1,1)

insert into a values (2,2)

insert into b values(2,3)

insert into a values (3,1)

insert into b values (4,1)

select * from a full outer join b on a.a = b.a

where a.b <> b.b

or (a.a is null and a.b is null)

or (b.a is null and b.b is null)

drop table a,b

Or you could buy a copy of Red-Gate SQL Data Compare. It does a compare of data, I think. Think they offer a 30-day trial version.

Gregory Larsen, DBA

If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

Gregory A. Larsen, MVP

NPratt SSC Enthusiast Points: 147 More actions · Answer 3

Something else to try

(Found this code on one of these useful SQL Server websites and it works pretty well for me) :

/*

SQL Script for Comparing the Contents of Two Tables

Here's some SQL code from Eli Leiba that will all you to compare

two tables -- say, table A and table B -- to determine if their

content is the same. Assuming that A and B have the same structure,

here's how it works:

First, from set theory, recall that: If ((|A| = |B|) && |A U B| = |A|))

====>>> A = B

|A| = NUMBER of rows in A

|B| = NUMBER of rows in B

Here's the SQL code (with T-SQL syntax, but can be adapted for

other DBMS's):

*/

declare @cnt1 int

declare @cnt2 int

declare @cnt3 int

declare @res bit

select @cnt1 = count(*) from A

select @cnt2 = count(*) from B

select @cnt3 = count('x')

from (select * from A

UNION

select * from B) as t

if (@cnt1 = @cnt2) and (@cnt2 = @cnt3)

begin

set @res = 1

print 'A = B'

end

else

begin

set @res = 0

print 'A <> B'

end

go

Nikki Pratt

Development DBA

Nikki Pratt
Development DBA

Greg Larsen SSC-Insane Points: 20891 More actions · Answer 4

Greg Larsen

SSC-Insane

Points: 20891

August 27, 2003 at 8:11 am

#471605

Nice way to use the UNION statement to determine if two sets are equal.

Gregory Larsen, DBA

If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

Gregory A. Larsen, MVP

Greg Larsen SSC-Insane Points: 20891 More actions · Answer 5

Now, since the UNION statement removes duplicates the UNION example will only work if there are no duplicate records in table A and B. Here is an example where table A and B are the same but the union example reports them as different. Still like the UNION example, though.

create table a (a int)

create table b (a int)

insert into a values (1)

insert into a values (3)

insert into b values (1)

insert into b values (3)

declare @cnt1 int

declare @cnt2 int

declare @cnt3 int

declare @res bit

select @cnt1 = count(*) from A

select @cnt2 = count(*) from B

select @cnt3 = count('x')

from (select * from A

UNION

select * from B) as t

select count('x')

from (select * from A

UNION

select * from B) as t

if (@cnt1 = @cnt2) and (@cnt2 = @cnt3)

begin

set @res = 1

print 'A = B'

end

else

begin

set @res = 0

print 'A <> B'

end

go

drop table a, b

Gregory Larsen, DBA

If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

Gregory A. Larsen, MVP

charlies Old Hand Points: 302 More actions · Answer 6

You can also try the following (Two tables containing the following columns: ID, UserID (initials), FirstName, and LastName)

Tables are names UserTable and UserTable1.

select * from usertable

where userID not in

(select userID from usertable1)

-- Then go the other way

select * from usertable1

where userid not in

(select userid from usertable)

regards,

nhamilton Mr or Mrs. 500 Points: 506 More actions · Answer 7

nhamilton

Mr or Mrs. 500

Points: 506

August 27, 2003 at 12:32 pm

#471672

Check out the script I submitted: "Compare Parallel Data Streams". It is designed to find the diferences between two parallel data streams, i.e. QA and production to verify a new process.

Nate

KerryH SSChasing Mays Points: 625 More actions · Answer 8

KerryH

SSChasing Mays

Points: 625

August 27, 2003 at 2:12 pm

#471695

There is a checksum and binary checksum function available with SQL Server. Checksums can be calculated on individual rows or for an entire table. The calculation of checksums is extremely fast when compared with doing field by field comparisons.

v_krishnan SSChasing Mays Points: 616 More actions · Answer 9

quote:

There is a checksum and binary checksum function available with SQL Server. Checksums can be calculated on individual rows or for an entire table. The calculation of checksums is extremely fast when compared with doing field by field comparisons.

I was looking for something like this because in my case the two tables to be compared are on different servers. Hence, something like a checksum would work for me. The other solutions posted will work within a single server only (correct me if I am wrong!).

I found that there is a SP called sp_table_validation which returns a checksum on a table. But, I am having problems in using it. How do I get the checksum it calculated?

Thanks,

Krishnan

KerryH SSChasing Mays Points: 625 More actions · Answer 10

To use a checksum to determine row changes you need to have a unique key so that individual rows can be identified. (Checksum does not work on text, ntext, image, timestamp or ROWGUID data types.)

One manner in which I have used checksum is to create a checksum table for each table I wish to check for changes. The table contains a column for each field of the unique key and a column for the calculated checksum. For instance, a checksum table for an employee may look like:

CREATE TABLE EmployeeCheckSum(

EmployeeNumber CHAR(10) NOT NULL,

EmployeeCheckSum INT NOT NULL)

GO

ALTER TABLE EmployeeCheckSum ADD CONSTRAINT PK_EmployeeCheckSum PRIMARY KEY (EmployeeNumber)

GO

I then make an initial load of the checksum table.

SELECT EmployeeNumber, EmployeeCheckSum=checksum(*)

INTO #TempEmployeeCheckSum

FROM Employee

GO

INSERT EmployeeCheckSum

SELECT * FROM #TempEmployeeCheckSum

GO

Finding rows that exist in one table but not the other is a simple query I will not cover here. Finding changed rows is simply a matter of creating a new checksum table and comparing the old checksums with the new ones. For example:

CREATE TABLE EmployeeCheckSumNew(

EmployeeNumber CHAR(10) NOT NULL,

EmployeeCheckSum INT NOT NULL)

GO

ALTER TABLE EmployeeCheckSumNew ADD CONSTRAINT PK_EmployeeCheckSumNew PRIMARY KEY (EmployeeNumber)

GO

SELECT EmployeeNumber, EmployeeCheckSum=checksum(*)

INTO #TempNewEmployeeCheckSum

FROM NewEmployee

GO

INSERT EmployeeCheckSumNew

SELECT * FROM #TempNewEmployeeCheckSum

GO

SELECT C1.EmployeeNumber

FROM EmployeeCheckSum C1, EmployeeCheckSum C2

WHERE C1.EmployeeNumber=C2.EmployeeNumber AND C1.EmployeeCheckSum<>C2.EmployeeCheckSum

GO

For my purposes I only wish to determine if a row has changed and I don't need to know what changed, I just replace the old row with the new one.

The function which calculates checksum is very fast so for finding changes to very large tables where only a small portion of the rows get modified frequently the checksum method is a good option to have.

If you just wish to see if a table has had a change you can calculate a table level checksum. For example, the following will generate a table level checksum for the employee table:

SELECT CHECKSUM_AGG(CHECKSUM(*))

FROM Employee

GO

There is also a BINARY_CHECKSUM function which can be used to check for case-sensitive values. Checksums can also be used to build a hash index on a table by adding a checksum column to a table and indexing the checksum column.

I believe SQL Server uses checksums internally for hash indexing and replication verification.

leitera SSC Journeyman Points: 82 More actions · Answer 11

I too like the elegance of the UNION solution. Could be I'm missing something, but wouldn't UNIONing the two tables against themselves first eliminate the problem of reporting tables with duplicate rows?

Example:

create table a (a int)

create table b (a int)

insert into a values (1)

insert into a values (3)

insert into b values (1)

insert into b values (3)

declare @cnt1 int

declare @cnt2 int

declare @cnt3 int

declare @res bit

select @cnt1 = count(*)

from (select * from a

UNION

select * from a) as t

select @cnt2 = count(*)

from (select * from b

UNION

select * from b) as t

select @cnt3 = count('x')

from (select * from a

UNION

select * from b) as t

if (@cnt1 = @cnt2) and (@cnt2 = @cnt3)

begin

set @res = 1

print 'A = B'

end

else

begin

set @res = 0

print 'A <> B'

end

go

drop table a, b

Greg Larsen SSC-Insane Points: 20891 More actions · Answer 12

leitera - Good way to remove those duplicate rows first before comparing the two tables. Although there still are issues with the union option. Here table B has more duplicate rows then table A, so they are not equal although the script says they are equal.

create table a (a int)

create table b (a int)

insert into a values (1)

insert into a values (3)

insert into b values (1)

insert into b values (3)

declare @cnt1 int

declare @cnt2 int

declare @cnt3 int

declare @res bit

select @cnt1 = count(*)

from (select * from a

UNION

select * from a) as t

select @cnt2 = count(*)

from (select * from b

UNION

select * from b) as t

select @cnt3 = count('x')

from (select * from a

UNION

select * from b) as t

if (@cnt1 = @cnt2) and (@cnt2 = @cnt3)

begin

set @res = 1

print 'A = B'

end

else

begin

set @res = 0

print 'A <> B'

end

go

drop table a, b

Gregory Larsen, DBA

If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

Gregory A. Larsen, MVP

Rick Todd Hall of Fame Points: 3302 More actions · Answer 13

Rick Todd

Hall of Fame

Points: 3302

June 6, 2007 at 9:10 am

#711142

Personally I like Kerry's solution, and that's what I'm going to use. It's easy to fit into my ETL scheme, which is why I was looking at this in the first place.

Rick Todd