INSERT INTO Statement w/ Duplicate Data (Recurring)

Question

Post reply

INSERT INTO Statement w/ Duplicate Data (Recurring)

eb24

SSC Eights!

Points: 823
More actions
June 23, 2004 at 2:54 pm

#161740

Here is the basis of my problem: what I'm looking for is to INSERT duplicate data ID's from one table to another table on a daily basis.

I have a table, say tblData, that contains millions of records and has a Primary Key named DataID. I have another table, say tblDuplicateData w/ a Primary Key named DuplicateDataID and a Foreign Key named DataID, where I would like all NEW duplicate data from tblData to be dumped into tblDuplicateData on a daily basis.

Here are more details: I have initially imported old data into tblData and it currently contains over 3 million rows. On a daily basis, I will be importing a text file from Oracle that will add around 6,000 records via a DTS package.

I have used the following code to locate all of the duplicate data (which works nicely):

SELECT Field1, Field2, Field3, count(*) as [How Many?]

FROM tblData

GROUP BY Field1, Field2, Field3

HAVING COUNT(*) > 1

Where I am having trouble is my second table, tblDuplicateData, which has DataID as a link (foreign key) to tblData, so I am having difficulty trying to figure out how to populate that table given the above SQL statement.

So I was thinking of enhancing my DTS package that has the daily file Oracle upload to include 'ON SUCCESS', an INSERT statement something like this (psuedo code):

INSERT INTO tblDuplicateData (DataID)

SELECT DataID

FROM tblData

WHERE DataID EXISTS?

    (SELECT Field1, Field2, Field3, count(*) as [How Many?]

     FROM tblData

     GROUP BY Field1, Field2, Field3

     HAVING COUNT(*) > 1)

AND DataID NOT IN tblDuplicateData

So to iterate once more, what I'm basically looking for is to run a Stored Procedure after my daily file upload that will INSERT into tblDuplicateData all DataID's that are DUPLICATES AND do not yet belong to that table, i.e. tblDuplicateData.

Thanks in advance for any assistance!

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply

Peter Tillotson Hall of Fame Points: 3325 More actions · Answer 1

I think you need:

insert into tblduplicatedata (dataid)

select dataid from tbldata d

left join tblduplicatedata dup on dup.dataid = d.dataid

inner join (select field1, field2, field3, min(dataid) as dataid

from tbldata group by field1, field2, field3

having count(*) > 1) as f

on d.field1 = f.field1 and d.field2 = f.field2 and d.field3 = f.field3 and d.dataid <> f.dataid

WHERE

and dup.dataid is null

I haven't been able to test it and there is no guarentees it will run quickly against millions of records. But it should give you a guide.

Regards

Peter Tillotson