Finding Duplicated across several servers

  • Hi,

    I have several db servers that service several of our clinics (each clinic using their own server,as of now). NOTE: The patientid is the key on the table I lloking up, so there will be no duplicates within the same database. I'm more concerned that another clinic may have used a patientid that another clinic may have used. I will soon move all data over to use One server/DB so I wanted to make sure I don't run into a problem when I do move over the data. In preparation for that, I wanted to do some research into the data we have now and see if we have duplicate clientid's within our servers. I have the following start off query which works for :

    SELECT PATIENTS.[PatientID], PATIENTS.[LastName], PATIENTS.[FirstName]

    FROM Independence_822_APG.dbo.PATIENTS

    WHERE (((PATIENTS.[PatientID])

    In (SELECT [PatientID] FROM Independence_822_APG.dbo.[PATIENTS] As Tmp GROUP BY [PatientID] HAVING Count(*)>1 )))

    ORDER BY PATIENTS.[PatientID]

    What I'd like to do is use a join all on the above query along with the queries for the other servers. For example (duplicate look up not used in example):

    Select patientid from Clinic1_822_APG.dbo.PATIENTS

    union all

    Select patientid from Clinic2_822_APG.dbo.PATIENTS

    Using the first query above, I was unsure how to code the query to get the duplicates across the server, as the one above will only get dupes within it's own db. Thanks for any advice.

    Michael

  • The easiest way to do this would be to consolidate the data into a single database, from all servers. Create a single table with a GUID column in it, then import all the data into that. The GUIDs will keep each row unique, regardless of patient ID.

    Once you have it all in one table, then you can collect all the rows with duplicate patient IDs simply by joining that table to itself on patient ID, using GUID to differentiate rows.

    Something like:

    select *

    from dbo.MyConsolidatedTable as T1

    inner join dbo.MyConsolidatedTable as T2

    on T1.PatientID = T2.PatientID

    and T1.GUID != T2.GUID ;

    Then, you can use criteria other than the patient ID to test for duplicates. Things like name, address, social security number (or local equivalent if not US and some comparable value is available), and so on.

    Don't just check for exact duplicates. The same person might be in on server as "John Smith" and another as "Johnny Smith", and an exact match on first name will miss that one.

    Does that help?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi GSquared. thanks for the reply. It did help. I thought about doing something similar, but wanted to see if there was an easier way (not to say that this is terribly difficult or anything). Thanks again for the help.

  • You're welcome.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply