Delete duplicate row

  • Can anyone know the script for deleting duplicate row from this following table

    table : emp

    empid  mgrid  empname

    1         null    test1

    2        1       test2

    3        1        test3

    2        null     duptest4

    3       null     duptest5

    want to delete these extra last two rows

    Thanks

     

  • The last 2 rows doesn't look like duplicates. still if you want to delete them, then just say

    delete from emp where empname = 'duptest4' or empname = 'duptest5'

    or use this to your original table to find the duplicate records.

    select empid, count(*) as ct from emp

    group by empid having (count(*) > 1)

     

     

  • Thanks for your reply

    I know the 1st delete query you have written to delete last two rows... actually my question was not clear I have just given an example of this table. There can be so many duplicate records i.e. same empid or probably same mathing other field. What I want to delete all duplicate rows having say empid 2 and leaving only one record with empid 2

  • Neel, you have to be more specific in what you want to accomplish. If there are 2 records that have empid 3, how do you know which one to delete?

  • Thanks rookie

    There is as such no criteria in our case to decide which one to delete or which one to keep having say empid 2, what we want there will be only one record having empid 2 empid 3 and so on

  • >>There is as such no criteria in our case to decide which one to delete or which one to keep

    That makes no sense from a business point of view.

    Why not just drop the other 2 columns if you don't care about the values they contain ?

     

  • Neel,

    What is the primary key for the table and... do you have something like a "DateEntered" column or an autonumbering column in the mix?  If so, do you want to keep the earliest or the latest entry for and EmpID?

    If you don't have any of that, then I have to agree with PW... from a business standpoint, the other two columns mean nothing to me...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • You need to define which row to keep and which one to delete.

    Can you provide such definition?

    _____________
    Code for TallyGenerator

  • ok, if there is no such crieteria , then i have tried like this

    create table table1( tid int ,tname varchar(100)

    )

    insert into table1 values (1,'veeresh')

    insert into table1 values (2,'shreeshail')

    insert into table1 values (3,'Ravi')

    insert into table1 values (1,'Ramesh')

    insert into table1 values (2,'Kiran')

    insert into table1 values (3,'Mahesh')

    alter table table1 add t_id int identity(1,1)

    delete from table1 where t_id in (

    select max(t_id) from table1

    group by tid )

    but always we shuold mentain a identity column ,specially in your case.

    try this

  • If you have 2 rows in a table that are duplicates, and you wish to delete only one of them, it would appear to be impossible, since any delete statement that would delete the duplicate would also match the one you'd like to keep.

    The solution: "set rowcount"

    Example:

    set rowcount 1 -- limit delete to 1 row

    Delete from Table where col1 = 'Apple' and col2 = 'Kangaroo'

    set rowcount 0  -- unset limit

    This example will delete only 1 row that contains the combination of Apple and Kangaroo, no matter how many there are in the table.  Keep doing it until you have no more duplicates.

    hth jg

     

     

     

     

  • hi,

    try this.  Ignore red writing as they are only my explanation.  Also with this code, it just delete all duplicate entries except one, but not neccessary delete the last two rows as you request. 

    declare @empid int , @cnt int

    if empid datatype is something else then change int to the appropriate datatype

    declare getallrecords cursor local static for

     select count(1), empid

     from emp(nolock)

    group by empid having count(1) > 1

     open getallrecords

    fetch next from getallrecords into @cnt, @empid

    while @@fetch_status=0

    begin 

    set @cnt = @cnt-1

    set rowcount @cnt

    delete from emp where empid = @empid 

     set rowcount 0

    fetch next from getallrecords into @cnt, @empid

     end

    close getallrecords

     deallocate getallrecords

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply