eliminating duplicates

  • Steve Jones - Editor (7/7/2008)


    It is interesting and the title doesn't help. It's a good thing that so many people want to help, and I can sympathize. Reading 50+ replies is time consuming.

    Everyone wants to chime in with their solution as well. Maybe it's time for an article on this πŸ˜‰

    On what? Gaining the ability to RTFS or deleting dupes? I believe that Barry already has an article in your court on this unusual type of delete.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi, use this query.. It is very important to delete duplicate records πŸ™‚

    ;with remove_dup as

    (

    select rowno=row_number()

    over(partition by name order by firstname,lastname),firstname,lastname

    from EmployeeDup

    )

    delete from remove_dup where rowno > 1 --this will delete all duplicate firstname and lastname records except one record

  • . EH! Wrong again! You only read the title... you didn't even read the requirements from the first post of the OP. Worse yet, this is a 2k forum and you posted a 2k5 solution.

    And, there are additional requirements that the op eventually get's to in about the first 4 posts. No interim tables which rules out Temp Tables, Table Variables, Cursors, While Loops, and just about everything that you'd use to solve the problem any other way than Barry did.

    It's absolutely amazing to me that so many people are simply not reading the requirements... RTFS folks!!!!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Yes Mr. Actually I have read the requirements and I have correctly suggested the intended solution to delete duplicate records.

    Although I have intentionally mentioned it for SQL SERVER 2005, I haven't just mentioned that it is for SQL Server 2005.

    There are many ways to deal with duplicate records and this IS one of those and is very useful for people in this topic looking for SQL SERVER 2005 enhancement (CTE QUERIES) used for deleting duplicate records.

  • harshil (7/8/2008)


    Yes Mr. Actually I have read the requirements and I have correctly suggested the intended solution to delete duplicate records.

    Although I have intentionally mentioned it for SQL SERVER 2005, I haven't just mentioned that it is for SQL Server 2005.

    There are many ways to deal with duplicate records and this IS one of those and is very useful for people in this topic looking for SQL SERVER 2005 enhancement (CTE QUERIES) used for deleting duplicate records.

    Harshil, with all due respect, may I suggest that you 'actually' read, and not simply claim to have read, the first dozen or so posts in this thread? You will almost certainly wish to edit your last post afterwards.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • harshil (7/8/2008)


    Yes Mr. Actually I have read the requirements and I have correctly suggested the intended solution to delete duplicate records.

    Although I have intentionally mentioned it for SQL SERVER 2005, I haven't just mentioned that it is for SQL Server 2005.

    There are many ways to deal with duplicate records and this IS one of those and is very useful for people in this topic looking for SQL SERVER 2005 enhancement (CTE QUERIES) used for deleting duplicate records.

    Heh... go look again... read very carefully... and, remember... you didn't even use the correct version of SQL Server whether you mentioned it or not πŸ˜‰ Now, even if it were the correct version, go back and look at the original post and see why your code should contain "=1" instead of ">1".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff, In this query, the condition is rightly mentioned as rowno>1 and shouldn't be rowno=1

    What the purpose of this query is to remove all duplicate records except one record. So, all records partitioned by the primary key combination having rowno > 1 should be deleted.

    ;with remove_dup as

    (

    select rowno=row_number()

    over(partition by name order by firstname,lastname),firstname,lastname

    from EmployeeDup

    )

    delete from remove_dup where rowno > 1 --this will delete all duplicate firstname and lastname records except one record

    I think this would be clearer with an example:

    Say, we have original output as:

    FirstName LastName

    Harshil Name1

    Jeff Name2

    Harshil Name1

    Harshil Name1

    Jeff Name2

    Our output should be (after deleting duplicate records):

    FirstName LastName

    Harshil Name1

    Jeff Name2

    What the CTE query without rowno>1 gives is:

    rowno FirstName LastName

    1 Harshil Name1

    2 Harshil Name1

    3 Harshil Name1

    1 Jeff Name2

    2 Jeff Name2

    Now, applying and deleting rowno>1 removes the duplicate records except those with rowno=1.

    I think you did not see that we are deleting records having rowno greater than 1 and keeping those with rowno equal to 1. So, the condition should be aptly deleting records having rowno>1.

    And Common Table Expressions (CTEs) are one of the most exciting features to be introduced with SQL Server 2005 (this is the SQL server version I have mentioned, is there anything wrong with the version name or something?)

    Anyway, this should make ourselves clear. Right? πŸ™‚

  • harshil, you obviously have not read the requirements at all.

    I think barry has done an excellent job explaining:

    rbarryyoung (7/1/2008)


    d_sysuk (7/1/2008)


    LEAVES ONLY 1 COPY of ANY DUplicate ROW as wanted !!!

    Except that that is NOT what was wanted. What was wanted was:

    Eliminate One row from each distinct Group.

    With the Following restrictions:

    No intermediate tables!

    and

    No additional Identity columns

    SQL Server 2000 (not 2005)

    And, yes, we all know that it can be easily be done with Cursors or While loops. In fact, it seems like the point of this challenge was to try to demonstrate something that could only be done with Cursors or other kinds of loops. Therefore there is one additional consideration that you might as well add:

    No Cursors, and No Loops!

    Now try it ...

  • harshil (7/8/2008)


    Hi Jeff, In this query, the condition is rightly mentioned as rowno>1 and shouldn't be rowno=1

    What the purpose of this query is to remove all duplicate records except one record. So, all records partitioned by the primary key combination having rowno > 1 should be deleted.

    ...

    Now, applying and deleting rowno>1 removes the duplicate records except those with rowno=1.

    I think you did not see that we are deleting records having rowno greater than 1 and keeping those with rowno equal to 1. So, the condition should be aptly deleting records having rowno>1.

    And Common Table Expressions (CTEs) are one of the most exciting features to be introduced with SQL Server 2005 (this is the SQL server version I have mentioned, is there anything wrong with the version name or something?)

    Anyway, this should make ourselves clear. Right? πŸ™‚

    Good Lord, Harshil! You've been politely guided in the right direction twice and you've still got all of it wrong.

    This is a SQL Server 2k question in a SQL Server 2k forum section. Exciting new features of SQL Server 2005 are totally irrelevant.

    It is not a dedupe excercise, it's the exact opposite - a NOT dedupe. Jeff is most certainly not mistaken.

    Which part of this do you not understand? Please - read the first few posts.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris,

    Thanks for the post dear.

    I am aware that I have posted in SQL 2k forum.

    Thought that the logic/query I have posted might be helpful to our dear friends who are searching for eliminating duplicates (because of the topic of the post), irrespective of whether it is 2k or 2k5.

    I have seen many a times that people get their solutions in different forums. That has happened to me as well.

    Moreover, the idea that I presented would be helpful to those who do a search on 'remove duplicates' or a similar sort of search.

    Of course, if someone feels that it would not prove helpful to them, they may frankly ignore the thing I have posted.

    The only point is that it might be useful to someone who might just visit the forum and by chance have a look at some similar solution which is not relevant in 2k but useful for 2k5.

    (Inventions happen by accident!)

    Frankly speaking, there is no point in pulling each others legs. πŸ™‚

    We have to work as a team and try to put forward our ideas so that other members can get the benefits.

  • harshil (7/8/2008)


    Hi Jeff, In this query, the condition is rightly mentioned as rowno>1 and shouldn't be rowno=1

    What the purpose of this query is to remove all duplicate records except one record. So, all records partitioned by the primary key combination having rowno > 1 should be deleted.

    That may be the purpose of YOUR query... but it is not the right query for this post. You really need to go back and read the original post where it says...

    [font="Arial Black"]I wish to delete the first row among the duplicate records and keep the others [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • :D:D:D:D:D:D:D:D

    SQL2005 Developer (who read the title but not the requirements)

    *fingers in ears*

    I can't hear you - SQL 2005 and I'm Gonna delete the dupes πŸ˜›

    Here's my solution......

    *sounds of gun fire, explosions and screams of pain*

    'Damn! how many more do we have to........before they listen/read the f***ing requirements?'

    :D:D:D:D:D:D:D:D

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • I agree. For those of us who have been following this thread from early on, these hastily posted solutions that aren't even for the correct SQL Server version or are simply de-duping are quite annoying. Yes, someone could come accross it if they run a search, but that's why we have a 2005 forum....and articles. If someone has an awesome way to de-dupe records in 2005, why not write an article?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Shaun McGuile (7/9/2008)


    'Damn! how many more do we have to........before they listen/read the f***ing requirements?'

    :D:D:D:D:D:D:D:D

    I think that the mathematically correct answer to this question is "infinity".

    The problem is that we are adding advisories at the wrong end and they are not reading them. They are just looking at the Subject/title or the OP.

    To stop this we either need to: A) change the title to "eliminating some duplicates" or "eliminating one duplicate", or B) change the original post, or C) stop replying so that it stops showing up in the "Active Threads" lists.

    Oops, I guess I just broke (C), again. πŸ™‚

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • It's ok, really... as suggested, it's a really great study in the human element. Some post with out reading, some argue they read, some read and still don't understand.

    And, no, I don't think we need to change a thing... it was pretty much stated that this was NOT a typlical delete in the very first post of this thread... πŸ˜›

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 106 through 120 (of 137 total)

You must be logged in to reply to this topic. Login to reply