Dupes issue

Question

Post reply

Dupes issue

tariquer

Right there with Babe

Points: 761
More actions
October 19, 2004 at 10:07 am

#152299

Hello,

I have an app where I import txt (tab delimited) file(s) into tables. To avoid duplicates I have a constraint of 5 key fields. This works perfectly. But I need to identify duplicates and store them in a seperate table for auditing. To identify duplicates I do 2 things:

1. I check the file (stored in a temporary table _A) against the existing table A using the 5 key fields. this works to identify if the data is already in ths system. This is the first thing that is done.

2. I check _A to see if it has any duplicates contained in it. I use the following:

INSERT INTO _tblDupes ( ....)

SELECT * FROM _A a

WHERE EXISTS (

SELECT NULL FROM _A b

WHERE

  b.[fld_1] = a.[fld_1]

  AND b.[fld_2] = a.[fld_2]

  AND b.[fld_3] = a.[fld_3]

  AND b.[fld_4] = a.[fld_4]

GROUP BY

  b.[fld_1], b.[fld_2],

  b.[fld_3], b.[fld_4]

HAVING

  COUNT(b.[ID]) > 1

)

This also works perfectly. BUT the problem is, when I do an insert into A from _A it will take the 1st row and disregard the 2nd row as a duplicate because of the constraint. Where as the above query inserts both rows in _A in the _tbleDupes table. So when I go to sum up all the data I recieved from the file then the dupes are getting counted 2 or even 3 times. So what I want is:

_A     A     _tblDupes

ab    ab

ac    ac

ac               ac

ad    ad

ad               ad

right now I am getting:

_A     A     _tblDupes

ab    ab

ac    ac        ac

ac               ac

ad    ad        ad

ad               ad

any help will be greatly appreciated.

Thank you

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Patrick Duflot SSChasing Mays Points: 660 More actions · Answer 1

You can use the DISTINCT keywords like

INSERT INTO _tblDupes ( ....)

SELECT DISTINCT * FROM _A a

WHERE EXISTS (

SELECT NULL FROM _A b

WHERE

b.[fld_1] = a.[fld_1]

AND b.[fld_2] = a.[fld_2]

AND b.[fld_3] = a.[fld_3]

AND b.[fld_4] = a.[fld_4]

GROUP BY

b.[fld_1], b.[fld_2],

b.[fld_3], b.[fld_4]

HAVING

COUNT(b.[ID]) > 1

)

Patrick Duflot

tariquer Right there with Babe Points: 761 More actions · Answer 2

tariquer

Right there with Babe

Points: 761

October 20, 2004 at 10:56 am

#527342

I have tried the DISTINCT select but to no avail. I thought that maybe because the duplicate rows dont have anything else common outside of the fields in the constraint but taht is not the case. I have 1 instance where every single field is identical and it still selected both. I have no idea how to fix this!!!

Calvin Lawson SSChampion Points: 11030 More actions · Answer 3

"Where exists (Select NULL From....)" should probably be changed to "Where Exists (Select 1 From...)", but I'm not sure that will change anything.

I think your problem has to do with getting the unique row number for duplicates. This can be done this way:

Insert _tblDupes

select a.UniqueID

From _A a

JOIN _A b on

b.[fld_1] = a.[fld_1]

AND b.[fld_2] = a.[fld_2]

AND b.[fld_3] = a.[fld_3]

AND b.[fld_4] = a.[fld_4]

where a.UniqueID < b.UniqueID

Update d

set Field0 = a.Field0,

Field1 = a.Field1

From _tblDupes d

JOIN _A a on d.UniqueID = a.UniqueID

If you don't have "UniqueID" then you can add it to the table as a identity column.

Signature is NULL

tariquer Right there with Babe Points: 761 More actions · Answer 4

I have an idea on this. seems to work in my 3 three test cases. 1st I create a view X which selects the constraint fields and group them along wi COUNT(ID) > 1. This gives me the records that will show up as dupes. then I do a view Y where i join _A and X on the mobileID and select top 1 w/ id in desc order. this will give me the 2nd record. this assumes that the 1st one got in. But if there is more then 2 duplicate records then this will not work. any thoughts on how i can fix that?

Let me know if this sounds like a good work around (sort of)?