Comparing two tables

  • Hi,

    I have a case where I need to check whether two tables whose structure is same contain the same data or not? Is there any built-in mechanism like checksum, or last modified time, etc. which could help me know whether the two tables are same or not, without manually comparing each row and column?

    Thanks,

    Krishnan

  • No, you'd need to compare each column. Comparing PKs is easy, but that won't tell you if all data is the same.

    Steve Jones

    sjones@sqlservercentral.com

    http://qa.sqlservercentral.com/columnists/sjones

    http://www.dkranch.net

  • You could write yourself some code that does that, like:

    create table a (a int primary key, b int)

    create table b (a int primary key, b int)

    insert into a values(1,1)

    insert into b values (1,1)

    insert into a values (2,2)

    insert into b values(2,3)

    insert into a values (3,1)

    insert into b values (4,1)

    select * from a full outer join b on a.a = b.a

    where a.b <> b.b

    or (a.a is null and a.b is null)

    or (b.a is null and b.b is null)

    drop table a,b

    Or you could buy a copy of Red-Gate SQL Data Compare. It does a compare of data, I think. Think they offer a 30-day trial version.

    Gregory Larsen, DBA

    If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

    Gregory A. Larsen, MVP

  • Something else to try

    (Found this code on one of these useful SQL Server websites and it works pretty well for me) :

    /*

    SQL Script for Comparing the Contents of Two Tables

    Here's some SQL code from Eli Leiba that will all you to compare

    two tables -- say, table A and table B -- to determine if their

    content is the same. Assuming that A and B have the same structure,

    here's how it works:

    First, from set theory, recall that: If ((|A| = |B|) && |A U B| = |A|))

    ====>>> A = B

    |A| = NUMBER of rows in A

    |B| = NUMBER of rows in B

    Here's the SQL code (with T-SQL syntax, but can be adapted for

    other DBMS's):

    */

    declare @cnt1 int

    declare @cnt2 int

    declare @cnt3 int

    declare @res bit

    select @cnt1 = count(*) from A

    select @cnt2 = count(*) from B

    select @cnt3 = count('x')

    from (select * from A

    UNION

    select * from B) as t

    if (@cnt1 = @cnt2) and (@cnt2 = @cnt3)

    begin

    set @res = 1

    print 'A = B'

    end

    else

    begin

    set @res = 0

    print 'A <> B'

    end

    go

    Nikki Pratt

    Development DBA


    Nikki Pratt
    Development DBA

  • Nice way to use the UNION statement to determine if two sets are equal.

    Gregory Larsen, DBA

    If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

    Gregory A. Larsen, MVP

  • Now, since the UNION statement removes duplicates the UNION example will only work if there are no duplicate records in table A and B. Here is an example where table A and B are the same but the union example reports them as different. Still like the UNION example, though.

    create table a (a int)

    create table b (a int)

    insert into a values (1)

    insert into a values (3)

    insert into a values (3)

    insert into b values (1)

    insert into b values (3)

    insert into b values (3)

    declare @cnt1 int

    declare @cnt2 int

    declare @cnt3 int

    declare @res bit

    select @cnt1 = count(*) from A

    select @cnt2 = count(*) from B

    select @cnt3 = count('x')

    from (select * from A

    UNION

    select * from B) as t

    select count('x')

    from (select * from A

    UNION

    select * from B) as t

    if (@cnt1 = @cnt2) and (@cnt2 = @cnt3)

    begin

    set @res = 1

    print 'A = B'

    end

    else

    begin

    set @res = 0

    print 'A <> B'

    end

    go

    drop table a, b

    Gregory Larsen, DBA

    If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

    Gregory A. Larsen, MVP

  • You can also try the following (Two tables containing the following columns: ID, UserID (initials), FirstName, and LastName)

    Tables are names UserTable and UserTable1.

    select * from usertable

    where userID not in

    (select userID from usertable1)

    -- Then go the other way

    select * from usertable1

    where userid not in

    (select userid from usertable)

    regards,

  • Check out the script I submitted: "Compare Parallel Data Streams". It is designed to find the diferences between two parallel data streams, i.e. QA and production to verify a new process.

    Nate

  • There is a checksum and binary checksum function available with SQL Server. Checksums can be calculated on individual rows or for an entire table. The calculation of checksums is extremely fast when compared with doing field by field comparisons.

  • quote:


    There is a checksum and binary checksum function available with SQL Server. Checksums can be calculated on individual rows or for an entire table. The calculation of checksums is extremely fast when compared with doing field by field comparisons.


    I was looking for something like this because in my case the two tables to be compared are on different servers. Hence, something like a checksum would work for me. The other solutions posted will work within a single server only (correct me if I am wrong!).

    I found that there is a SP called sp_table_validation which returns a checksum on a table. But, I am having problems in using it. How do I get the checksum it calculated?

    Thanks,

    Krishnan

  • To use a checksum to determine row changes you need to have a unique key so that individual rows can be identified. (Checksum does not work on text, ntext, image, timestamp or ROWGUID data types.)

    One manner in which I have used checksum is to create a checksum table for each table I wish to check for changes. The table contains a column for each field of the unique key and a column for the calculated checksum. For instance, a checksum table for an employee may look like:

    CREATE TABLE EmployeeCheckSum(

    EmployeeNumber CHAR(10) NOT NULL,

    EmployeeCheckSum INT NOT NULL)

    GO

    ALTER TABLE EmployeeCheckSum ADD CONSTRAINT PK_EmployeeCheckSum PRIMARY KEY (EmployeeNumber)

    GO

    I then make an initial load of the checksum table.

    SELECT EmployeeNumber, EmployeeCheckSum=checksum(*)

    INTO #TempEmployeeCheckSum

    FROM Employee

    GO

    INSERT EmployeeCheckSum

    SELECT * FROM #TempEmployeeCheckSum

    GO

    Finding rows that exist in one table but not the other is a simple query I will not cover here. Finding changed rows is simply a matter of creating a new checksum table and comparing the old checksums with the new ones. For example:

    CREATE TABLE EmployeeCheckSumNew(

    EmployeeNumber CHAR(10) NOT NULL,

    EmployeeCheckSum INT NOT NULL)

    GO

    ALTER TABLE EmployeeCheckSumNew ADD CONSTRAINT PK_EmployeeCheckSumNew PRIMARY KEY (EmployeeNumber)

    GO

    SELECT EmployeeNumber, EmployeeCheckSum=checksum(*)

    INTO #TempNewEmployeeCheckSum

    FROM NewEmployee

    GO

    INSERT EmployeeCheckSumNew

    SELECT * FROM #TempNewEmployeeCheckSum

    GO

    SELECT C1.EmployeeNumber

    FROM EmployeeCheckSum C1, EmployeeCheckSum C2

    WHERE C1.EmployeeNumber=C2.EmployeeNumber AND C1.EmployeeCheckSum<>C2.EmployeeCheckSum

    GO

    For my purposes I only wish to determine if a row has changed and I don't need to know what changed, I just replace the old row with the new one.

    The function which calculates checksum is very fast so for finding changes to very large tables where only a small portion of the rows get modified frequently the checksum method is a good option to have.

    If you just wish to see if a table has had a change you can calculate a table level checksum. For example, the following will generate a table level checksum for the employee table:

    SELECT CHECKSUM_AGG(CHECKSUM(*))

    FROM Employee

    GO

    There is also a BINARY_CHECKSUM function which can be used to check for case-sensitive values. Checksums can also be used to build a hash index on a table by adding a checksum column to a table and indexing the checksum column.

    I believe SQL Server uses checksums internally for hash indexing and replication verification.

  • I too like the elegance of the UNION solution. Could be I'm missing something, but wouldn't UNIONing the two tables against themselves first eliminate the problem of reporting tables with duplicate rows?

    Example:

    create table a (a int)

    create table b (a int)

    insert into a values (1)

    insert into a values (3)

    insert into a values (3)

    insert into b values (1)

    insert into b values (3)

    insert into b values (3)

    declare @cnt1 int

    declare @cnt2 int

    declare @cnt3 int

    declare @res bit

    select @cnt1 = count(*)

    from (select * from a

    UNION

    select * from a) as t

    select @cnt2 = count(*)

    from (select * from b

    UNION

    select * from b) as t

    select @cnt3 = count('x')

    from (select * from a

    UNION

    select * from b) as t

    if (@cnt1 = @cnt2) and (@cnt2 = @cnt3)

    begin

    set @res = 1

    print 'A = B'

    end

    else

    begin

    set @res = 0

    print 'A <> B'

    end

    go

    drop table a, b

  • leitera - Good way to remove those duplicate rows first before comparing the two tables. Although there still are issues with the union option. Here table B has more duplicate rows then table A, so they are not equal although the script says they are equal.

    create table a (a int)

    create table b (a int)

    insert into a values (1)

    insert into a values (3)

    insert into a values (3)

    insert into b values (1)

    insert into b values (3)

    insert into b values (3)

    insert into b values (3)

    declare @cnt1 int

    declare @cnt2 int

    declare @cnt3 int

    declare @res bit

    select @cnt1 = count(*)

    from (select * from a

    UNION

    select * from a) as t

    select @cnt2 = count(*)

    from (select * from b

    UNION

    select * from b) as t

    select @cnt3 = count('x')

    from (select * from a

    UNION

    select * from b) as t

    if (@cnt1 = @cnt2) and (@cnt2 = @cnt3)

    begin

    set @res = 1

    print 'A = B'

    end

    else

    begin

    set @res = 0

    print 'A <> B'

    end

    go

    drop table a, b

    Gregory Larsen, DBA

    If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

    Gregory A. Larsen, MVP

  • Personally I like Kerry's solution, and that's what I'm going to use. It's easy to fit into my ETL scheme, which is why I was looking at this in the first place.


    Rick Todd

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply