Delete "near" duplicate records

  • MS SQL 2005

    I have a large table and have recently found that one of the feeds to it contains duplicated data with the exception of one field. I want to try and remove one copy of the data, and while I could do a select distinct with a max or min of the one field that is different (number) this would mean sending the data to a new table, deleting the all the data that is duplicated and then re-insterting the now distinct data. As this will continue until we can get the feed from the source system sorted out (which could take months) I was hoping to have an automated way of doing this.

    I am ashamed to say I have written a very dirty way to do this :Whistling:, but I am not happy with using it as it is extremely long winded is

    DECLARE @count INT

    SET @count = 0

    WHILE (@count < 100) -- Change to number to remove

    BEGIN

    DELETE TOP(1) from DWH_STATEMENT_FACT

    where Account_FK =( Select Top (1) Max(Account_FK) from dbo.DWH_STATEMENT_FACT

    Group By Account_No, Calendar_Date, Data_Source

    having Count(Account_FK) >1

    and Calendar_DATE = '2011-11-16')

    and Calendar_DATE = '2011-11-16'

    and Data_source ='IMS'

    SET @count = (@count + 1)

    END

    I am not really worried which of the two copies is removed as the non distinct data is fairly similar. The number of duplicates on a daily basis is in the region of 8,000 out of roughly 300,000 new records. Does anyone know a nice clean way to get the same results?

  • Use a CTE, for example:

    CREATE TABLE #T(Date_Stamp DATETIME,KW1 DECIMAL(5,1), KW2 DECIMAL(5,1))

    INSERT INTO #T

    SELECT '12/10/2010', 5.3, 3.1 UNION ALL

    SELECT '12/10/2010', 5.3, 3.1 UNION ALL

    SELECT '12/9/2010', 4, 2 UNION ALL

    SELECT '12/8/2010', 3, 1 UNION ALL

    SELECT '12/7/2010', 7.4, 5 UNION ALL

    SELECT '12/7/2010', 7.4, 5 UNION ALL

    SELECT '12/7/2010', 7.4, 5

    with cte

    as (select row_number() over(partition by Date_Stamp,KW1,KW2 order by Date_Stamp) as rn,

    KW1,KW2

    from #T)

    SELECT * FROM cte WHERE rn > 1

    Once you have tested this, alter the SELECT * .... to a DELETE FROM cte WHERE rn> 1

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Oooh Powerful!

    I need to test more, but I think I like this CTE.

    😀

  • adrian.walsh 56299 (11/17/2011)


    Oooh Powerful!

    I need to test more, but I think I like this CTE.

    😀

    Definitely test, and test again, and when you believe the results are correct - test again to be absolutely sure.

    And thanks for your feedback

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • its good.

    I'm newbie on. My hobbies are computer using and internet surffing. Many more thing I found here, which are attractive thats why I join this forum. have a nice day guys.

    Hope Its a good journey with you.

    thanks,

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply