Dup finding question...

  • I have a table that i need to find dup. Finding the dup is easy, but the problem is that i dont have to kept one of them and flagging the other as dup but to kept four of them then flagging them as a dup.

    Any thought?

    Thanks

    My dup finding :

    UPDATE Employees SET Dup = Yes

    WHERE Employees.EmployeeID not in (SELECT Min(Employees.EmployeeID) AS MinOfEmployeeID FROM Employees GROUP BY Employees.FirstName, Employees.LastName);

  • Hmm... I'm not sure I understood the question... are you asking how to mark all entries with duplicity, instead of leaving always one of them unmarked? If so, then try this:

    UPDATE e

    SET Dup = Yes

    FROM Employees e

    JOIN

    (SELECT FirstName, LastName, count(EmployeeID)

     FROM Employees

     GROUP BY FirstName, LastName

     HAVING count(EmployeeID) > 1) as dupl

    ON e.FirstName = dupl.FirstName AND e.LastName = dupl.LastName

    I didn't test the query, so I hope there isn't any blunder in it and it will work :-).

    cheers, Vladan

  • Vladan, Thank for answering.

    I wish it would be that simple. But, like you said, maybe i didn't explain myself well.

    UPDATE e

    SET Dup = Yes

    FROM Employees e

    JOIN

    (SELECT FirstName, LastName, count(EmployeeID)

     FROM Employees

     GROUP BY FirstName, LastName

     HAVING count(EmployeeID) > 1) as dupl

    ON e.FirstName = dupl.FirstName AND e.LastName = dupl.LastName

    >> This set the flag dup to all dup

    UPDATE Employees SET Dup = Yes

    WHERE Employees.EmployeeID not in (SELECT Min(Employees.EmployeeID) AS MinOfEmployeeID FROM Employees GROUP BY Employees.FirstName, Employees.LastName);

    >> This set the flag dup for all dup except one

    But what i need, is one query that will set the flag dup for all dup except the first four.

    Exemple :

    The result of that query should look like this :

    Table A

    ID  | Name   | Dup

    ------------------

    1   | Albert | No

    2   | Albert | No

    3   | Albert | No

    4   | Albert | No

    5   | Albert | Yes

    ... | Albert | Yes

    16  | David  | No

    17  | David  | No

    18  | David  | No

    19  | Jack   | No

    So far, tha only way i could think about doing that, would be to use the second query written in this post to flags all dup except one, then, do another query that would flag only the first dup of a series of all flagged dup as not a dup and then execute that query 3 times in row to get the same result.

    But that wouldn't be efficient as i would have to execute 4 query to get the result. Last time, i flagged all dup as dup, and did the work by hand, but they were only 37 dup on 59 536. I just want to know how to do that with a larger amount of dup with a sql query.

  • Oh... that's different :-). Well, then what about to use your original query and modify it a bit:

    UPDATE emp

    SET Dup = Yes

    FROM Employees emp

    WHERE EmployeeID NOT IN

    (SELECT TOP 4 EmployeeID 

    FROM Employees

    WHERE Employees.FirstName = emp.FirstName AND Employees.LastName = emp.LastName

    ORDER BY Employees.EmployeeID)

    It wouldn't be very efficient on tables with millions of rows, but since this is not the case, I guess it could work fine. Again, the query is untested.

    Vladan

  • Oh, I see... That simple? Seen like too good to be true. Ill try it and give feedback but i might modify it to split it into two query, i find access really really slow with select inside a select or query clling another query.

    Thanks again Vladan

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply