Most efficient way of finding and storing where there are duplicate entries

Question

Post reply

Most efficient way of finding and storing where there are duplicate entries

mastersql

SSC Eights!

Points: 815
More actions
December 9, 2015 at 5:15 pm

#148858

Hi

I'm just learning about CTE and the WITH statement. I find it trickier to write at the moment but which would you say is the best solution to finding duplicate values in my table rows ::-

`Output plan 1 :-

SELECT ID4, DATERECORD INTO TABLEMIX2

FROM TABLEMIX

GROUP BY ID4, DATERECORD

HAVING COUNT(ID3)>1 AND COUNT(ID4)>1 AND COUNT(DATERECORD)>1 –- ..10 more AND COUNT precedents to follow

ORDER BY ID4

Output plan 2 :-

WITH TESTTABLE_CTE (ID3, ID4, DATERECORD)

AS

(

SELECT ID3, ID4, DATERECORD

FROM dbo.TABLEMIX

GROUP BY ID3, ID4, DATERECORD

HAVING COUNT(ID3)>1 AND COUNT(ID4)>1 AND COUNT(DATERECORD)>1 –- ..10 more AND COUNT precedents to follow

)INSERT INTO TESTTABLE3 (ID3, ID4, DATERECORD)

SELECT ID3, ID4, DATERECORD FROM TESTTABLE_CTE

Output plan 3 :-

WITH TABLEMIX_CTE (ROWNUM, ID3, ID4, DATERECORD)

AS

(

SELECT ROW_NUMBER() OVER (PARTITION BY ID3, ID4, DATERECORD ORDER BY ID4),ID3,ID4,DATERECORD –- ..10 more AND COUNT precedents to follow

AS ROWNUM

FROM TABLEMIX

)

INSERT INTO TESTTABLE3 (ID3, ID4, DATERECORD)

SELECT ID3, ID4, DATERECORD FROM TABLEMIX_CTE WHERE ROWNUM=2 -- to pick out any duplicate entry – if there are 2 or more then it will output for creating new table entry

`

Plan 1 creates the table first if it doesn;t exist also.

Cheers

James

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

mastersql (12/9/2015)

Hi

I'm just learning about CTE and the WITH statement. I find it trickier to write at the moment but which would you say is the best solution to finding duplicate values in my table rows ::-

`Output plan 1 :-

SELECT ID4, DATERECORD INTO TABLEMIX2

FROM TABLEMIX

GROUP BY ID4, DATERECORD

HAVING COUNT(ID3)>1 AND COUNT(ID4)>1 AND COUNT(DATERECORD)>1 –- ..10 more AND COUNT precedents to follow

ORDER BY ID4

Output plan 2 :-

WITH TESTTABLE_CTE (ID3, ID4, DATERECORD)

AS

(

SELECT ID3, ID4, DATERECORD

FROM dbo.TABLEMIX

GROUP BY ID3, ID4, DATERECORD

HAVING COUNT(ID3)>1 AND COUNT(ID4)>1 AND COUNT(DATERECORD)>1 –- ..10 more AND COUNT precedents to follow

)INSERT INTO TESTTABLE3 (ID3, ID4, DATERECORD)

SELECT ID3, ID4, DATERECORD FROM TESTTABLE_CTE

Output plan 3 :-

WITH TABLEMIX_CTE (ROWNUM, ID3, ID4, DATERECORD)

AS

(

SELECT ROW_NUMBER() OVER (PARTITION BY ID3, ID4, DATERECORD ORDER BY ID4),ID3,ID4,DATERECORD –- ..10 more AND COUNT precedents to follow

AS ROWNUM

FROM TABLEMIX

)

INSERT INTO TESTTABLE3 (ID3, ID4, DATERECORD)

SELECT ID3, ID4, DATERECORD FROM TABLEMIX_CTE WHERE ROWNUM=2 -- to pick out any duplicate entry – if there are 2 or more then it will output for creating new table entry

`

Plan 1 creates the table first if it doesn;t exist also.

Cheers

James

"It Depends". What do you want to do with the duplicates once you find them?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

mastersql SSC Eights! Points: 815 More actions · Answer 2

mastersql

SSC Eights!

Points: 815

December 10, 2015 at 12:20 am

#1845103

Hi

After I've found the duplicates I'm going to contact the customer to confirm if he'd still like them removing. Removing them is the easy part as I've found a previously written stored procedure which unfiles the entire record safely. Then the user can choose to delete using their menu options.

Cheers

James

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

mastersql (12/10/2015)

Hi

After I've found the duplicates I'm going to contact the customer to confirm if he'd still like them removing. Removing them is the easy part as I've found a previously written stored procedure which unfiles the entire record safely. Then the user can choose to delete using their menu options.

Cheers

James

Which duplicate do you want to keep? Temporally the first or the last?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

mastersql SSC Eights! Points: 815 More actions · Answer 4

Hi Jeff

For now I was just wanting a list of the duplicates. I'm learning its mostly a matter of preference whether we use the GROUP BY or WITH statements or maybe a different solution. I'm not really at the Developer level yet where I have to know the database really well - I currently work in IT Support so just writing little custom scripts at the moment to help customers as a workaround to help them until a Developer can look into (if ever).

Cheers

James