Removing Duplicates

  • I have inherited a table that has duplicates (not exact duplicates). The table has the following columns:

    Computer_Name;Operating_System;Processor_Speed;Location

    Values that exist in the table are like the following:

    c3217j8;Windows 2000;500;FLORIDA

    c3217j8;Windows 98;500;FLORIDA

    c3217j8;Windows 95;300;NOT-KNOWN

    p8c123l;Windows 95;300;TEXAS

    p8c123l;Windows 2000;500;TEXAS

    u303413;Windows 95;600;KANSAS

    I need some way to test/elimantie the duplicate records in the following order:

    If Operating_System = Windows 2000 then

    If Processor_Speed >=500 then

    If Location <> NOT-KNOWN then

    THIS IS A VALID RECORD AND ALL

    OTHERS WITH THE SAME NAME CAN

    BE REMOVED.

    End If

    Else

    USE THE COMPUTER NAME WITH HIGHEST

    PROCESSOR SPEED AND DELETE ALL OTHER

    RECORDS WITH SAME COMPUTER NAME

    End If

    Else

    REMOVE ALL RECORDS WITH THE SAME COMPUTER

    NAME EXCEPT ONE. IT DOES NOT MAKE A DIFFERENCE

    WHICH ONE.

    End if

    I hope I did an OK job of explaining my problem. If not let me know so I can answer any questions. Thank you in advance.

  • Firstly you need a unique row identifier for each row add an identity field

    
    
    SELECT *
    From yourTable
    WHERE Operating_System = '2000'
    AND Processor_Speed >=500
    AND Location <> 'NOT-KNOWN'

    Use the sql above to get the rows to satisfy the first aprt of your 'IF' pseudocode and join this back onto the table (self join) on the name and delete those rows where the row identifiers are not equal.

    Then run a second query for the second part of your 'IF' which again is a self join but this time you do a self to delete those with slower processor speeds.

    Then write another query to remove the remaining duplicates.

    I'll try to go into more detail later ..I am very busy for now.

    Nigel Moore
    ======================

  • If have understood your logic correctly the this should do the trick

    select x.*
    
    from atable x
    left outer join atable a on a.Computer_Name = x.Computer_Name and a.Operating_System = 'Windows 2000' and a.Processor_Speed >= 500 and a.Location <> 'NOT-KNOWN'
    left outer join atable b on b.Computer_Name = x.Computer_Name and b.Operating_System = 'Windows 2000' and b.Processor_Speed >= 500 and b.Location = 'NOT-KNOWN'
    left outer join (select Computer_Name,Operating_System,max(Processor_Speed) as 'Processor_Speed' from atable where Operating_System = 'Windows 2000' group by Computer_Name,Operating_System) c on c.Computer_Name = x.Computer_Name and c.Operating_System = x.Operating_System
    left outer join (select Computer_Name,min(Operating_System) as 'Operating_System',min(Processor_Speed) as 'Processor_Speed',min(Location) as 'Location' from atable group by Computer_Name) d on d.Computer_Name = x.Computer_Name and d.Operating_System = x.Operating_System and d.Processor_Speed = x.Processor_Speed and d.Location = x.Location
    where (case
    when a.Computer_Name is not null and (a.Operating_System = x.Operating_System and a.Processor_Speed = x.Processor_Speed and a.Location = x.Location) then 1
    when a.Computer_Name is not null then 0
    --when b.Computer_Name is not null and (b.Operating_System = x.Operating_System and b.Processor_Speed = x.Processor_Speed and b.Location = x.Location) then 2
    when c.Computer_Name is not null and (c.Operating_System = x.Operating_System and c.Processor_Speed = x.Processor_Speed) then 3
    when c.Computer_Name is not null then 0
    when d.Computer_Name is not null and (d.Operating_System = x.Operating_System and d.Processor_Speed = x.Processor_Speed and d.Location = x.Location) then 4
    else 0 end) = 0

    I have noted one line for a condition not defined in your logic

    Operating_System = Windows 2000, Processor_Speed >=500 and Location = NOT-KNOWN

    Far away is close at hand in the images of elsewhere.
    Anon.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply