Most efficient way of finding and storing where there are duplicate entries

  • Hi

    I'm just learning about CTE and the WITH statement. I find it trickier to write at the moment but which would you say is the best solution to finding duplicate values in my table rows ::-

    `Output plan 1 :-

    SELECT ID4, DATERECORD INTO TABLEMIX2

    FROM TABLEMIX

    GROUP BY ID4, DATERECORD

    HAVING COUNT(ID3)>1 AND COUNT(ID4)>1 AND COUNT(DATERECORD)>1 –- ..10 more AND COUNT precedents to follow

    ORDER BY ID4

    Output plan 2 :-

    WITH TESTTABLE_CTE (ID3, ID4, DATERECORD)

    AS

    (

    SELECT ID3, ID4, DATERECORD

    FROM dbo.TABLEMIX

    GROUP BY ID3, ID4, DATERECORD

    HAVING COUNT(ID3)>1 AND COUNT(ID4)>1 AND COUNT(DATERECORD)>1 –- ..10 more AND COUNT precedents to follow

    )INSERT INTO TESTTABLE3 (ID3, ID4, DATERECORD)

    SELECT ID3, ID4, DATERECORD FROM TESTTABLE_CTE

    Output plan 3 :-

    WITH TABLEMIX_CTE (ROWNUM, ID3, ID4, DATERECORD)

    AS

    (

    SELECT ROW_NUMBER() OVER (PARTITION BY ID3, ID4, DATERECORD ORDER BY ID4),ID3,ID4,DATERECORD –- ..10 more AND COUNT precedents to follow

    AS ROWNUM

    FROM TABLEMIX

    )

    INSERT INTO TESTTABLE3 (ID3, ID4, DATERECORD)

    SELECT ID3, ID4, DATERECORD FROM TABLEMIX_CTE WHERE ROWNUM=2 -- to pick out any duplicate entry – if there are 2 or more then it will output for creating new table entry

    `

    Plan 1 creates the table first if it doesn;t exist also.

    Cheers

    James

  • mastersql (12/9/2015)


    Hi

    I'm just learning about CTE and the WITH statement. I find it trickier to write at the moment but which would you say is the best solution to finding duplicate values in my table rows ::-

    `Output plan 1 :-

    SELECT ID4, DATERECORD INTO TABLEMIX2

    FROM TABLEMIX

    GROUP BY ID4, DATERECORD

    HAVING COUNT(ID3)>1 AND COUNT(ID4)>1 AND COUNT(DATERECORD)>1 –- ..10 more AND COUNT precedents to follow

    ORDER BY ID4

    Output plan 2 :-

    WITH TESTTABLE_CTE (ID3, ID4, DATERECORD)

    AS

    (

    SELECT ID3, ID4, DATERECORD

    FROM dbo.TABLEMIX

    GROUP BY ID3, ID4, DATERECORD

    HAVING COUNT(ID3)>1 AND COUNT(ID4)>1 AND COUNT(DATERECORD)>1 –- ..10 more AND COUNT precedents to follow

    )INSERT INTO TESTTABLE3 (ID3, ID4, DATERECORD)

    SELECT ID3, ID4, DATERECORD FROM TESTTABLE_CTE

    Output plan 3 :-

    WITH TABLEMIX_CTE (ROWNUM, ID3, ID4, DATERECORD)

    AS

    (

    SELECT ROW_NUMBER() OVER (PARTITION BY ID3, ID4, DATERECORD ORDER BY ID4),ID3,ID4,DATERECORD –- ..10 more AND COUNT precedents to follow

    AS ROWNUM

    FROM TABLEMIX

    )

    INSERT INTO TESTTABLE3 (ID3, ID4, DATERECORD)

    SELECT ID3, ID4, DATERECORD FROM TABLEMIX_CTE WHERE ROWNUM=2 -- to pick out any duplicate entry – if there are 2 or more then it will output for creating new table entry

    `

    Plan 1 creates the table first if it doesn;t exist also.

    Cheers

    James

    "It Depends". What do you want to do with the duplicates once you find them?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi

    After I've found the duplicates I'm going to contact the customer to confirm if he'd still like them removing. Removing them is the easy part as I've found a previously written stored procedure which unfiles the entire record safely. Then the user can choose to delete using their menu options.

    Cheers

    James

  • mastersql (12/10/2015)


    Hi

    After I've found the duplicates I'm going to contact the customer to confirm if he'd still like them removing. Removing them is the easy part as I've found a previously written stored procedure which unfiles the entire record safely. Then the user can choose to delete using their menu options.

    Cheers

    James

    Which duplicate do you want to keep? Temporally the first or the last?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff

    For now I was just wanting a list of the duplicates. I'm learning its mostly a matter of preference whether we use the GROUP BY or WITH statements or maybe a different solution. I'm not really at the Developer level yet where I have to know the database really well - I currently work in IT Support so just writing little custom scripts at the moment to help customers as a workaround to help them until a Developer can look into (if ever).

    Cheers

    James

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply