Identifying column data that is the same in multiple records

  • Hello,

    I have a database that holds various info about people. First names, last names, nicknames, who entered them into the system, etc. I am trying to find duplicates in the system, people that were entered more than once. For instance one user may have entered John Smith while another user may have entered the same person as John Wayne Smith. I believe the following script correctly identifies all of the names that appear more than once:

    SELECT lastname, firstname, COUNT(lastname) AS '# of matching Lastnames', COUNT(firstname) AS '# of matching Firstnames'

    FROM subject

    GROUP BY lastname, firstname

    HAVING COUNT(*) > 1

    When I run this I end up with 67 rows returned. My problem now is I would like to be able to list additional information about the 67 people returned from the query above such as date of birth and nicknames but this additional info resides in other tables in the database so I would have to cross reference with join statements which I did:

    SELECT lastname, firstname, birthdate,COUNT(lastname) AS '# of matching Lastnames', COUNT(firstname) AS '# of matching Firstnames'

    FROM subject

    INNER JOIN subjectdob b

    ON subject.subjectid = b.subjectid

    GROUP BY lastname, firstname, birthdate

    HAVING COUNT(*) > 1

    But when I run this I end up getting only 19 rows returned. Most of these rows are do not even match the rows that were returned in the first query. Some of the people returned from the first query are returned from this one but many were not plus some people returned from this query were not returned from the first one. I am assuming that it is looking for multiple instances of lastname, firstname AND birthdate but I am not interested in finding matching birthdates I just want to know what the birthdate is for the records where the last name and first name both happen to match.

    Is it possible to get additional info from other tables returned for the first query above without screwing up the results as I obviously did unintentionally in the second query?

    Thanks

  • Use that table that you found the duplicates in as a derived table, Join to it on the fields you believe are the unique ones back to the original table(s)

    Select *

    FROM subject x

    INNER JOIN subjectdob y

    ON subject.subjectid = b.subjectid

    INNER JOIN (SELECT lastname, firstname, birthdate,COUNT(lastname) AS '# of matching Lastnames', COUNT(firstname) AS '# of matching Firstnames'

    FROM subject

    INNER JOIN subjectdob b

    ON subject.subjectid = b.subjectid

    GROUP BY lastname, firstname, birthdate

    HAVING COUNT(*) > 1) AS dt

    ON x.lastname = dt.lastname

    and x.firstname = dt.firstname

    and x.birthdate = dt.birthdate

  • Thanks Ray! I have it working like a charm now!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply