remove duplicate records

  • Hi all,

    I need te remove some records (about 5000) from a table.

    The records look like below.

    101740 Henk Krassenburg 3034 16112

    101741 Jerry Kloot 3340 16111

    101751 Jerry Kloot 3340 16111

    101754 Jerry Kloot 3340 16111

    101752 Jan van Harten 3350 16111

    101742 Jan van Harten 3350 16111

    101743 Cor Megens 3600 16111

    101750 Cor Megens 3600 16111

    101749 Janus van Haertens 655 16112

    101753 Janus van Haertens 655 16112

    We need to remove the records if the third column is duplicate and the first column has the highest number.

    Can someone help me out here ?

     

    Thanks in advance

     

  • You have to use some kind temp table.

    This should work, I tested it with a couple of records

    Select Min(Column1) as Column1,Column3,Count(Column3) as Count

    Into #Temp_Keep

    from Table1

    Group By Column3

    Having Count(Column3)>1

    Delete From Table1

    Where Column1 Not In (Select Column1 from #Temp_Keep) and Column3 IN (Select Column3 from #Temp_Keep)

  • If you have some kind of index value in the row, you can remove them directly.  For example, if there is an ID or an entry date associated with the row.  Let's choose date:

    DELETE FROM <table> WHERE EXISTS (SELECT * FROM <table> dupEnt WHERE dupEnt.RowDate > <table>.RowDate )

    Guarddata-

  • First of all, let's assume that your first column has unique value.

    Second, let's name the columns as Col1 through Col5.

    DELETE FROM YourTable

    FROM YourTable LEFT OUTER JOIN

    (SELECT Col3, MAX(Col1) AS Col1 FROM YourTable GROUP BY Col3) U

    ON

    YourTable.Col1 = U.Col1

    WHERE U.Col1 IS NULL

     

    Good luck!

     

  • Give this a try...

    SET NOCOUNT ON

    IF EXISTS(SELECT * FROM tempdb..sysobjects WHERE id = object_id('tempdb..#Foo'))

        DROP TABLE #Foo

    CREATE TABLE #Foo

        (

        FLD1            int

        , FLD2          nvarchar(25)

        , FLD3          int

        , FLD4          int

        )

    INSERT INTO #Foo

    VALUES (101740, 'Henk Krassenburg', 3034, 16112)

    INSERT INTO #Foo

    VALUES (101741 ,'Jerry Kloot' ,3340 ,16111)

    INSERT INTO #Foo

    VALUES (101751 ,'Jerry Kloot' ,3340 ,16111)

    INSERT INTO #Foo

    VALUES (101754 ,'Jerry Kloot' ,3340 ,16111)

    INSERT INTO #Foo

    VALUES (101752 ,'Jan van Harten' ,3350 ,16111)

    INSERT INTO #Foo

    VALUES (101742 ,'Jan van Harten' ,3350 ,16111)

    INSERT INTO #Foo

    VALUES (101743 ,'Cor Megens' ,3600 ,16111)

    INSERT INTO #Foo

    VALUES (101750 ,'Cor Megens' ,3600 ,16111)

    INSERT INTO #Foo

    VALUES (101749 ,'Janus van Haertens' ,655 ,16112)

    INSERT INTO #Foo

    VALUES (101753 ,'Janus van Haertens' ,655 ,16112)

    SELECT f.*

    FROM #Foo f

    DELETE f

    FROM #Foo f

        JOIN (SELECT FLD3 FROM #Foo GROUP BY FLD3 Having COUNT(*) > 1) Dupes ON f.FLD3 = Dupes.FLD3

    WHERE f.FLD1 IN(SELECT MAX(FLD1) FROM #Foo WHERE FLD3 = Dupes.FLD3)

    SELECT f.*

    FROM #Foo f

     

     




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • Gary's statement removes duplicates, but not triplicates or higher.  It was unclear from your statement which behavior you wanted, but this will remove all but the record with the lowest FLD1

    delete f1

    From #Foo f0 (nolock)

    JOIN #Foo f1 (nolock)

      on f0.FLD3 = f1.FLD3

    Where f0.FLD1 < F1.FLD1

    Signature is NULL

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply