Comparing columns in two tables for unequal values

Question

Post reply

Comparing columns in two tables for unequal values

mapperhd

SSC-Addicted

Points: 429
More actions
September 30, 2011 at 4:04 pm

#243957

Hi,

We have a double data entry process and want to compare the entries of one record in Table A to the duplicated entries in Table B (identical to TableA). What I want to do is compare a record from each table by an uniqueID and retain only columns where the values are not equal. I want to keep these results and save them to a table (TableC). TableC does not need to be defined in the solution here.

I could easliy do this in a VBA script or, dare I say...a CURSOR!!!!!!, but I'd like to see if there is a better way and keep this back on the server and out of the application.

I've looked a INTERSECT and EXCEPT, but they return the both whole records.

Thanks in advance,

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

CarpenterBob Ten Centuries Points: 1006 More actions · Answer 1

CarpenterBob

Ten Centuries

Points: 1006

September 30, 2011 at 4:20 pm

#1389050

http://msdn.microsoft.com/en-us/library/ms180026.aspx

This should do.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 2

Roughly:

DELETE a

OUTPUT tblc

FROM

tblA as a

JOIN

tblB as b

ON a.col1 = b.col1 AND a.col2 = b.col2 ....

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

vince_sql Right there with Babe Points: 722 More actions · Answer 3

mapperhd (9/30/2011)

Hi,

We have a double data entry process and want to compare the entries of one record in Table A to the duplicated entries in Table B (identical to TableA). What I want to do is compare a record from each table by an uniqueID and retain only columns where the values are not equal. I want to keep these results and save them to a table (TableC). TableC does not need to be defined in the solution here.

I could easliy do this in a VBA script or, dare I say...a CURSOR!!!!!!, but I'd like to see if there is a better way and keep this back on the server and out of the application.

I've looked a INTERSECT and EXCEPT, but they return the both whole records.

Thanks in advance,

A small example, is this what you are looking for? Or something else - your question is not 100% clear.



INSERT INTO tableC(empID, aName, bName, aSalary, bSalary)

SELECT a.empID,a.empName,b.empName,a.empSalary,b.empsalary

FROM tableA a

INNER JOIN tableB b

ON a.empID = b.empid

WHERE a.empName <> b.empName or a.empSalary <> b.empSalary

http://sqlvince.blogspot.com/[/url]

mapperhd SSC-Addicted Points: 429 More actions · Answer 4

Thanks for the replies.

However, I need to keep only the columns that are not equal. Let's say there are 12 columns in the two records being compared. If columns 4 and 9 in record 1 do not have equal values to the same columns in record 2 how do I keep only those two columns (and the ID) to insert into another table?

Sorry if I wan't very clear in my first post and I hope this helps to clarify.

Thanks again for responses.

R.P.Rozema SSChampion Points: 12305 More actions · Answer 5

mapperhd (10/2/2011)

Thanks for the replies.

However, I need to keep only the columns that are not equal. Let's say there are 12 columns in the two records being compared. If columns 4 and 9 in record 1 do not have equal values to the same columns in record 2 how do I keep only those two columns (and the ID) to insert into another table?

Sorry if I wan't very clear in my first post and I hope this helps to clarify.

Thanks again for responses.

You could potentially end up with a lot of tables with differences that way: you would need a new table for every unique combination of columns with differences in it. Are you sure you want to have just the columns with differences in it? And not simply the entire rows that you've found to contain at least one difference? Or do you just want the names of the columns that have differences for each set of 2 rows that has at least one different column between them?

The last one is fairly easy (I have created some DDL plus sample data for you, please have a look at the link in my footer text for more info on how and WHY to provide this yourself the next time you have a question):

declare @table1 table (



   ID int not null primary key,

   col1 varchar(10) not null,

   col2 varchar(200) null,

   col3 int not null,

   col4 datetime null

);

declare @table2 table (

   ID int not null primary key,

   col1 varchar(10) not null,

   col2 varchar(200) null,

   col3 int not null,

   col4 datetime null

);

insert @table1(ID, col1, col2, col3, col4)

select 1, 'test1', 'some longer value', 23, {d '1900-01-01'} union all

select 2, 'test2', 'some longer value', 23, {d '1900-01-01'} union all

select 3, 'test3', 'some longer value', 23, {d '1900-01-01'}

insert @table2(ID, col1, col2, col3, col4)

select 1, 'test1', 'some longer value', 23, {d '1900-01-01'} union all

select 2, 'test21', 'some longer value', 23, {d '1900-01-02'} union all

select 3, 'test4', 'another longer value', 22, {d '1900-01-01'} union all

select 4, 'test4', 'another longer value', 22, {d '1900-01-01'}

select diff.column_names, t1.*, t2.*

from @table1 t1

   full outer join @table2 t2 on (t2.ID = t1.ID)

cross apply (

      select stuff((

            select ', ' + t.name as [text()]

            from (

                  select 'ID' as name where t1.ID is null or t2.ID is null

                  union all select 'col1' where not t1.col1 = t2.col1

                  union all select 'col2' where not ((t1.col2 is null and t2.col2 is null) or (t1.col2 = t2.col2))

                  union all select 'col3' where not t1.col3 = t2.col3

                  union all select 'col4' where not ((t1.col4 is null and t2.col4 is null) or (t1.col4 = t2.col4))

               ) t

            for xml path(''), type

         ).value('.','varchar(8000)'),1, 2, '') as column_names

   ) diff

where diff.column_names is not null

Which outputs:

column_names ID col1 col2 col3 col4 ID col1 col2 col3 col4



col1, col4	2	test2	some longer value	23	1900-01-01 00:00:00.000	2	test21	some longer value	23	1900-01-02 00:00:00.000

col1, col2, col3	3	test3	some longer value	23	1900-01-01 00:00:00.000	3	test4	another longer value	22	1900-01-01 00:00:00.000

ID NULL NULL NULL NULL NULL 4 test4 another longer value 22 1900-01-01 00:00:00.000

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

drew.allen SSC Guru Points: 76776 More actions · Answer 6

mapperhd (10/2/2011)

Thanks for the replies.

However, I need to keep only the columns that are not equal.

The problem with this is that it forces you to violate the first normal form for database design: All records must have the same number of fields.

For example, say you want to keep the record ID, the column name, and the values from each of the tables. With one column different, you want to keep a total of four columns; with two columns different, you want to keep a total of seven columns, etc. As you can see, this violates first normal form, because the records don't have the same number of fields.

There are three basic ways that you can resolve this situation.

1) Remove the spec to keep only columns that are different, so that you are keeping the same number of columns for each record.

2) Store the column difference information in a single XML field, since XML doesn't have the same constraints as a database. (The problem with this being that you will still need to revert it to first normal form in order to work with it further.)

3) "PIVOT" the information so that you have one row for each column that is different. So, if one column is different, you would have one row for that record; with two columns different, you would need two rows, etc. (Again, you would need to unpivot it to work with it further.)

Which approach you use depends on how you are planning to use this data. Certainly, the first approach is the simplest.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Stephan Douglas SSC Veteran Points: 249 More actions · Answer 7

Stephan Douglas

SSC Veteran

Points: 249

March 21, 2012 at 8:50 pm

#1462508

R.P.

Thank you, your post was exactly what I was looking for.

vikingDBA SSCertifiable Points: 5356 More actions · Answer 8

If it helps, here is a script I use that creates a script to run to find the differences between the records in the same table in two different databases (like if there is a restore of a previous backup to a test database name to check for differences in the databases).

It creates a script, so just copy the printed output to a new query window and run it.

/* ====================================================================================================================== */

/*

This script creates a script to find the differences in two tables from two databases on the same instance (like a

test version of a database, being tested by another group of people or with another program).

Also creates the scripts to find records in one table that are not in the other, for both tables.

After run, just copy the script that is printed to another query window and run it (after making any necessary adjustments to the code).

Values to SET:

===============

Just set the database context (just below), Schema Name, Table Name, and Database Names for the two databases (in the "Values to change by User" section).

*/

USE MainDatabase -- SET THIS context to the primary database for the extraction of the table definitions

GO

DECLARE @SN varchar(200)

DECLARE @TN varchar(200)

DECLARE @db1 varchar(200)

DECLARE @db2 varchar(200)

DECLARE @collist varchar(4000)

DECLARE @collist2 varchar(4000)

DECLARE @ordinalposition int

DECLARE @schemaname varchar(200)

DECLARE @tablename varchar(200)

DECLARE @dataelement varchar(200)

DECLARE @datatype varchar(200)

DECLARE @maxlen char(10)

DECLARE @prec char(10)

DECLARE @scale char(10)

DECLARE @comparevals varchar(8000)

DECLARE @ds varchar(8000)

DECLARE @wow varchar(400)

DECLARE @numelements int

/* ============================================================================================== */

/* Values to change by User */

SET @SN = 'dbo' --Schema Name

SET @TN = 'SomeTableName' --Table Name

SET @db1 = 'MainDatabase' --Database 1 (a)

SET @db2 = 'TestDatabase' --Database 2 (b)

/* ============================================================================================== */

SET @collist = '' --leave blank

SET @collist2 = '' --leave blank

SET NOCOUNT ON

SELECT ORDINAL_POSITION AS 'OrdinalPosition',

TABLE_SCHEMA AS 'SchemaName',

TABLE_NAME AS 'TableName',

COLUMN_NAME AS 'DataElement', DATA_TYPE AS 'DataType',

ISNULL(CONVERT(char(10),CHARACTER_MAXIMUM_LENGTH),'') AS 'MaxLen', ISNULL(CONVERT(char(10),NUMERIC_PRECISION),'') AS 'Prec', ISNULL(CONVERT(char(10),NUMERIC_SCALE),'') AS 'Scale',

CONVERT(varchar(200),'') as ColNames,

CONVERT(varchar(400),'') as CompareVals

INTO #dummytable1

FROM information_schema.columns gg

WHERE TABLE_SCHEMA=@sn AND TABLE_NAME = @TN

ORDER BY [SchemaName], TABLE_CATALOG, TABLE_NAME, ORDINAL_POSITION

SELECT @collist2=@collist2 + DataElement + ',' FROM #dummytable1 ORDER BY OrdinalPosition

SET @collist2 = SUBSTRING(@collist2,1,LEN(@collist2)-1)

UPDATE #dummytable1

SET ColNames = 'a.' + DataElement + ' AS a' + DataElement + ',b.' + DataElement + ' AS b' + DataElement + ','

SELECT @collist=@collist + ColNames FROM #dummytable1 ORDER BY OrdinalPosition

SET @collist = SUBSTRING(@collist,1,LEN(@collist)-1)

SELECT @numelements=MAX(OrdinalPosition) FROM #dummytable1

/* ============================================================================================== */

DECLARE backupFiles CURSOR FOR

SELECT OrdinalPosition,SchemaName,TableName,DataElement,DataType,MaxLen,Prec,Scale

FROM #dummytable1

ORDER BY OrdinalPosition

OPEN backupFiles

-- Loop through all the files for the database

FETCH NEXT FROM backupFiles INTO @ordinalposition,@schemaname,@tablename,@dataelement,@datatype,@maxlen,@prec,@scale

WHILE @@FETCH_STATUS = 0

BEGIN

SET @ds = ''

if @datatype = 'char' OR @datatype='nchar' or @datatype='varchar' or @datatype='nvarchar' or @datatype = 'varbinary'

SET @ds = 'ISNULL(a.' + @dataelement + ','''') <> ISNULL(b.' + @dataelement + ','''')'

if @datatype = 'datetime'

SET @ds = 'ISNULL(a.' + @dataelement + ',CONVERT(datetime,''01/01/1800'')) <> ISNULL(b.' + @dataelement +',CONVERT(datetime,''01/01/1800''))'

if @datatype = 'int' OR @datatype = 'smallint' OR @datatype = 'bigint' OR @datatype='tinyint'

SET @ds = 'ISNULL(a.' + @dataelement + ',0) <> ISNULL(b.' + @dataelement + ',0)'

if @datatype = 'decimal'

SET @ds = 'ISNULL(a.' + @dataelement + ',0.0) <> ISNULL(b.' + @dataelement + ',0.0)'

if @datatype = 'money'

SET @ds = 'ISNULL(a.' + @dataelement + ',0.0) <> ISNULL(b.' + @dataelement + ',0.0)'

if @datatype = 'bit'

SET @ds = 'ISNULL(a.' + @dataelement + ',0) <> ISNULL(b.' + @dataelement + ',0)'

if @datatype = 'text'

SET @ds = 'ISNULL(CONVERT(varchar(4000),a.' + @dataelement + '),'''') <> ISNULL(CONVERT(varchar(4000),b.' + @dataelement + '),'''')'

if @ds = '' --AND @datatype <> 'text'

SET @ds = 'a.' + @dataelement + ' <> b.' + @dataelement

if @ds <> ''

BEGIN

UPDATE #dummytable1

SET CompareVals=@ds

WHERE OrdinalPosition=@ordinalPosition

END

FETCH NEXT FROM backupFiles INTO @ordinalposition,@schemaname,@tablename,@dataelement,@datatype,@maxlen,@prec,@scale

END

CLOSE backupFiles

DEALLOCATE backupFiles

/* ============================================================================================== */

PRINT ' '

PRINT '-- Check for different values for the records that are common to both databases (a=' + @db1 + ',b=' + @db2 + ') for table ' + @SN + '.' + @TN

SET @wow = (SELECT DataElement FROM #dummytable1 WHERE OrdinalPosition=1)

PRINT 'SELECT ' + @collist

PRINT 'FROM ' + @db1 + '.' + @SN + '.' + @TN + ' a JOIN ' + @db2 + '.' + @SN + '.' + @TN + ' b'

PRINT 'ON a.' + @wow + ' = b.' + @wow + ' --change this PK if necessary'

PRINT 'WHERE'

DECLARE backupFiles CURSOR FOR

SELECT OrdinalPosition,SchemaName,TableName,DataElement,DataType,MaxLen,Prec,Scale,CompareVals

FROM #dummytable1

ORDER BY OrdinalPosition

OPEN backupFiles

-- Loop through all the files for the database

FETCH NEXT FROM backupFiles INTO @ordinalposition,@schemaname,@tablename,@dataelement,@datatype,@maxlen,@prec,@scale,@comparevals

WHILE @@FETCH_STATUS = 0

BEGIN

if @ordinalposition=@numelements

SET @ds = @comparevals

else

SET @ds = @comparevals + ' OR'

if @comparevals <> ''

PRINT @ds

FETCH NEXT FROM backupFiles INTO @ordinalposition,@schemaname,@tablename,@dataelement,@datatype,@maxlen,@prec,@scale,@comparevals

END

CLOSE backupFiles

DEALLOCATE backupFiles

--SELECT * FROM #dummytable1

PRINT ''

PRINT '--Find records in db ' + @db1 + ' that are not in db ' + @db2

PRINT 'SELECT ' + @collist2

PRINT 'FROM ' + @db1 + '.' + @SN + '.' + @TN

PRINT 'WHERE ' + @wow + ' NOT IN (SELECT a.' + @wow + ' FROM ' + @db2 + '.' + @SN + '.' + @TN + ' a)'

PRINT ''

PRINT '--Find records in db ' + @db2 + ' that are not in db ' + @db1

PRINT 'SELECT ' + @collist2

PRINT 'FROM ' + @db2 + '.' + @SN + '.' + @TN

PRINT 'WHERE ' + @wow + ' NOT IN (SELECT b.' + @wow + ' FROM ' + @db1 + '.' + @SN + '.' + @TN + ' b)'

DROP TABLE #dummytable1

SET NOCOUNT OFF

vikingDBA SSCertifiable Points: 5356 More actions · Answer 9

vikingDBA

SSCertifiable

Points: 5356

March 22, 2012 at 8:58 am

#1462783

Note that the script assumes that the structure of the table in the two databases hasn't changed.