Deleting Duplicate Records

  • Cleanest and easiest way to clean up duplicates that I have seen.

  • A very good and helpful topic

  • Hi Ramakrishnan,

               The script you had provided was quite a bit useful but i have some reservations on this method. Actually i had drawn up a similar script but found that when the number of columns increase in the table, you will have to group by all the columns. is there any other method to delete the duplicate rows other than taking the stock into the temp table and again fetching it back to the physical table.

  • My method for duplicate removal  may not suit every one. it offers the advantage if you have duplicate data but the identity column makes the data unique you can modify the select * to only the duplicated fields.

     

    begin tran

    Select Distinct * into [newtable] from [badtable]

    truncate table [badtable]

    insert into badtable select * from newtable

    drop table [newtable]

     

    ColinR

     

     

     

  • Here's another method. This one adds an identity column, uses it, and then drops it.

    ALTER TABLE dbo.employee ADD tempid INT IDENTITY(1, 1)

    DELETE dbo.employee WHERE tempid NOT IN (SELECT MIN(tempid) FROM dbo.employee GROUP BY id, NAME, salary)

    ALTER TABLE dbo.employee DROP COLUMN tempid

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Or, if you use SQL Server 2005,

     

    DELETE t1

    FROM (

    SELECT ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2 DESC) AS RecID

    FROM Table1

    ) AS t1

    WHERE RecID > 1

     


    N 56°04'39.16"
    E 12°55'05.25"

  • My question is this....

    Please correct me if I'm wrong, or how to get around the issue, but how about a real-life scenario that I have:

    I have been given a logical definition of a duplicate row to be a composite of 4 columns. However, there are actually 10 columns in the table. This data has duplicates in the logical composite key, but it's not necessarily a duplicate if you included the other data fields in the "group by". If I want to copy the data to the other "temp" table, I have to include all the columns in the "group by" or I will lose the other data, but in my case, I can't do that or I will miss some duplicates.

  • sorry I read on msdn that 'Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server'. So this method will not work anymore, will it not?

    Thanks

    Gio

  • Hi ,

    I have tried duplicate record deletion without using temp tables.

    And according to me its faster and a very short query.

    CREATE TABLE EMP

    (EMPID INT,EMPNAME VARCHAR(100))

    INSERT INTO EMP VALUES (1,'A')

    INSERT INTO EMP VALUES (1,'A')

    INSERT INTO EMP VALUES (1,'A')

    INSERT INTO EMP VALUES (2,'B')

    INSERT INTO EMP VALUES (2,'B')

    INSERT INTO EMP VALUES (3,'C')

    INSERT INTO EMP VALUES (4,'D')

    INSERT INTO EMP VALUES (4,'D')

    INSERT INTO EMP VALUES (4,'D')

    INSERT INTO EMP VALUES (4,'D')

    INSERT INTO EMP VALUES (5,'E')

    INSERT INTO EMP VALUES (5,'E')

    INSERT INTO EMP VALUES (6,'F')

    INSERT INTO EMP VALUES (6,'F')

    INSERT INTO EMP VALUES (6,'F')

    INSERT INTO EMP VALUES (6,'F')

    INSERT INTO EMP VALUES (6,'F')

    INSERT INTO EMP VALUES (7,'G')

    SELECT * FROM EMP

    SELECT @@ROWCOUNT

    --(Initially the rowcount must be > 0 )

    WHILE @@ROWCOUNT > 0

    DELETE TOP(1) FROM EMP WHERE EXISTS

    (SELECT E.EMPID ,COUNT(E.EMPID) FROM EMP E

    WHERE EMP.EMPID=E.EMPID GROUP BY E.EMPID HAVING COUNT(E.EMPID) > 1)

    SELECT * FROM EMP

    Correct if wrong ........................

    :smooooth:

  • Hi Peso,

    i tried your code but it did not work.

    Are you sure it works?

    Thanks

  • Hi All,

    I have a problem which everybody is discussing.

    I need to find the duplicate record in the table. Its not a matter of just 1 column.

    create table #temp

    (ddatetime datetime ,

    co1 int ,

    col2 int )

    Insert into #temp

    values ('2001-11-12 13:29:00.000' , 101 , 15)

    Insert into #temp

    values ('2001-11-12 13:45:00.000' , 102 , 15)

    Insert into #temp

    values ('2001-11-12 13:50:49.000' , 101 , 15) [duplicate] 😎

    Insert into #temp

    values ('2001-11-12 13:50:49.000' , 101 , 15) [duplicate] 😎

    Insert into #temp

    values ('2001-11-12 14:00:49.000' , 101 , 15)

    Insert into #temp

    values ('2001-11-12 14:00:49.000' , 102 , 15)

    Insert into #temp

    values ('2001-11-12 14:00:49.000' , 103 , 15)

    I have to find the duplicate value with the combination of the 3 columns.

    Please help

  • Ajay... Yes it does work.

    DELETE t1

    FROM (

    SELECT ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2 DESC) AS RecID

    FROM Table1

    ) AS t1

    WHERE RecID > 1


    N 56°04'39.16"
    E 12°55'05.25"

  • harsha.bhagat1

    SELECT ddatetime, col1, col2

    FROM (select ddatetime, col1, col2, row_number() over (partition by ddatetime, col1, col2 order by ddatetime) as recid

    from table1

    ) as d

    where recid > 1


    N 56°04'39.16"
    E 12°55'05.25"

  • Thank you...

  • Delete From TableName

    Where ID Not IN

    {

    Select MAX(ID) From TableName

    Group By Col1, Col2...

    }

    By The help of Table name useing ID and the Group By we can delete the duplicate record from table in sql server.

    For more Interview question Click on billow link.

    http://sqlserver4us.blogspot.in/[/url]

Viewing 15 posts - 31 through 44 (of 44 total)

You must be logged in to reply to this topic. Login to reply