August 24, 2004 at 5:36 pm
I have a table that i need to find dup. Finding the dup is easy, but the problem is that i dont have to kept one of them and flagging the other as dup but to kept four of them then flagging them as a dup.
Any thought?
Thanks
My dup finding :
UPDATE Employees SET Dup = Yes
WHERE Employees.EmployeeID not in (SELECT Min(Employees.EmployeeID) AS MinOfEmployeeID FROM Employees GROUP BY Employees.FirstName, Employees.LastName);
August 26, 2004 at 3:37 am
Hmm... I'm not sure I understood the question... are you asking how to mark all entries with duplicity, instead of leaving always one of them unmarked? If so, then try this:
UPDATE e
SET Dup = Yes
FROM Employees e
JOIN
(SELECT FirstName, LastName, count(EmployeeID)
FROM Employees
GROUP BY FirstName, LastName
HAVING count(EmployeeID) > 1) as dupl
ON e.FirstName = dupl.FirstName AND e.LastName = dupl.LastName
I didn't test the query, so I hope there isn't any blunder in it and it will work :-).
cheers, Vladan
August 26, 2004 at 8:43 am
Vladan, Thank for answering.
I wish it would be that simple. But, like you said, maybe i didn't explain myself well.
UPDATE e
SET Dup = Yes
FROM Employees e
JOIN
(SELECT FirstName, LastName, count(EmployeeID)
FROM Employees
GROUP BY FirstName, LastName
HAVING count(EmployeeID) > 1) as dupl
ON e.FirstName = dupl.FirstName AND e.LastName = dupl.LastName
>> This set the flag dup to all dup
UPDATE Employees SET Dup = Yes
WHERE Employees.EmployeeID not in (SELECT Min(Employees.EmployeeID) AS MinOfEmployeeID FROM Employees GROUP BY Employees.FirstName, Employees.LastName);
>> This set the flag dup for all dup except one
But what i need, is one query that will set the flag dup for all dup except the first four.
Exemple :
The result of that query should look like this :
Table A
ID | Name | Dup
------------------
1 | Albert | No
2 | Albert | No
3 | Albert | No
4 | Albert | No
5 | Albert | Yes
... | Albert | Yes
16 | David | No
17 | David | No
18 | David | No
19 | Jack | No
So far, tha only way i could think about doing that, would be to use the second query written in this post to flags all dup except one, then, do another query that would flag only the first dup of a series of all flagged dup as not a dup and then execute that query 3 times in row to get the same result.
But that wouldn't be efficient as i would have to execute 4 query to get the result. Last time, i flagged all dup as dup, and did the work by hand, but they were only 37 dup on 59 536. I just want to know how to do that with a larger amount of dup with a sql query.
August 27, 2004 at 1:08 am
Oh... that's different :-). Well, then what about to use your original query and modify it a bit:
UPDATE emp
SET Dup = Yes
FROM Employees emp
WHERE EmployeeID NOT IN
(SELECT TOP 4 EmployeeID
FROM Employees
WHERE Employees.FirstName = emp.FirstName AND Employees.LastName = emp.LastName
ORDER BY Employees.EmployeeID)
It wouldn't be very efficient on tables with millions of rows, but since this is not the case, I guess it could work fine. Again, the query is untested.
Vladan
August 27, 2004 at 11:27 am
Oh, I see... That simple? Seen like too good to be true. Ill try it and give feedback but i might modify it to split it into two query, i find access really really slow with select inside a select or query clling another query.
Thanks again Vladan
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply