Please advise on unique index creation

  • We are still on sql server 2005.

    We are trying to create a unique index on our patient table.

    The unique index will be a multi-column index (Status, PatientNumber, First,Last,DOB,Gender).

    However data already contains multiple duplicates.

    We can have 2 records (Active, 0001, John, Doe, 1/1/1960,M) and (Active, 0001, John, Doe, 1/1/1960,M) and they are actually dups so 1 must be inactivated.

    We can also have 2 records (Active,0001, John, Doe, 1/1/1960,M) and (Active,0001, John, Doe, 1/1/1960,M) and they are NOT dups so the practice must find a new PatientNumber for one of the 2 records to indicate 2 distinct patients that just happen to have same name, DOB and gender.

    Since there are dups, the user has been inactivating dups to keep one living record.

    So We can have 3 records (Inactive, 0001, John, Doe, 1/1/1960,M) and ( Inactive,0001, John, Doe, 1/1/1960,M) and ( Active,0001, John, Doe, 1/1/1960,M). One Inactive dup has to be removed for the unique index creation.

    The business does not embrace fixing existing duplicate data.

    We don't want to use a function to enforce patient uniqueness to check for active rows only.

    My plan was to clean data as per this consideration: A unique index, UNIQUE constraint, or PRIMARY KEY constraint cannot be created if duplicate key values exist in the data.

    BUT our dba said there is an option when creating a unique index that will let you create that index and not complain about existing dups??? Needless to say the business is overjoyed hearing this option.

    For example: I plan to create the unique index on 4/1/2014. I can use some "option" when creating that index to tell sql server to not bother with duplicates that exist before 4/1/2014? After the index is created (ie after 4/1/2014) all dups will be violations.

    I am having difficulty finding that option. Can anyone advise or comment?

    Thanks!

  • Sounds ugly.

    What you could do is create a separate table to store the duplicates "OldPatientRecords" or something similar. Use ROW_NUMBER/PARTITION BY to extract your duplicate records (i.e. Where Dups > 1) and INSERT them into your new storage table. This would get your data in your main table ready for your "fat" clustered index, while preserving the data at the same time. Then create your unique clustered index worry-free (for the most part).

    If you have the option of playing around with your existing table, you could also create a new table, insert the proper rows into it then rename it to the "real" name, then apply a CONSTRAINT to it, that would ignore future duplicates. For example ("hyphens" added to get around my proxy server)C-REATE TABLE #Test (Col1 INT , col2 VARCHAR(10), col3 VARCHAR(10), col4 DATE)

    A-LTER TABLE #Test WITH NOCHECK ADD CONSTRAINT chk_DuplicateRow UNIQUE NONCLUSTERED (col1, col2, col3, col4) WITH (IGNORE_DUP_KEY = ON)

    GO

    INSERT INTO #Test

    SELECT 1, 'John', 'Doe', '11/22/1975'

    U-NION ALL SELECT 0, 'John', 'Doe', '11/22/1975'

    INSERT INTO #Test

    SELECT 1, 'John', 'Doe', '11/22/1975'

    U-NION ALL SELECT 0, 'John', 'Doe', '11/22/1975'

    U-NION ALL SELECT 0, 'John', 'Doe', '11/22/1975'

    U-NION ALL SELECT 1, 'Jane', 'Doe', '11/21/1975'

    SELECT * FROM #Test

    D-ROP TABLE #TestWhen the application attempted to insert a duplication record it wouldn't error out per say, but would just output the message "Duplicate key was ignored". Not certain if this helps you out any though...perhaps some of the experts on this site can step in to offer you a better solution.

    I'm very curious though, why isn't your DBA handling this particular issue?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • I am very fond of the "OldPatientRecords" and main clean table approach. That is how I approached it. However, I did not get approval because it complicates reporting and queries. These patients have dispense associations whether they are active or inactive.

    Queries and reports now have to go after 2 tables.

    IGNORE_DUP_KEY is an option that I cannot implement because we don’t silently discard user data. In other words, if the user submits 10 inserts, and I can only insert 7 out of 10, my process will do exception reporting to explain why the other 3 did not get processed.

    I have to say that I love your level of curiosity. I am a db developer not a DBA. When I design or model the physical structure I always get a DBA involved, hoping to tap into their real life experience.

    The issue was assigned to me so the DBA did not take ownership. Per my consultation, I was told to find the option that turns off dups in the past relative to the day the unique index will be created. That is why I am looking for it and so far have not found it anywhere...

    Do you as a DBA handle data integrity and will tackle issues like these?

    Many thanks!

  • Coriolan (3/3/2014)


    Do you as a DBA handle data integrity and will tackle issues like these?

    Yes, handle and govern the whole deal. If I'm responsible for maintaining the data, I'm sure heck going to be involved in saying how it gets stored. How else could I ensure things like performance? Etc? If it were me I'd make a decision either way to maintain the validity of the data, set up the proper constraints to handle existing issues (the "other table"), eliminating further ones, and perhaps create a view or something that merged the two data sets together. That way, reporting is not affected. I mean as long as they have what they need on the front-end, why would they care what has gone on under-the-covers so-to-speak?

    I know that's blanket statement and there's lots of "ifs" and "buts", however it's my job as a DBA to make both sides happen. So in my case I'd find a way to minimally impact the user (ie. make it seem like nothing has changed) while ensuring my data is stored properly, with an appropriate clustered index, and on the proper key column(s).

    Tough situation for you.

    if the user submits 10 inserts, and I can only insert 7 out of 10, my process will do exception reporting to explain why the other 3 did not get processed.

    You could work around this by performing a MERGE or what everyone calls an UPSERT statement. If a record comes through that already exists you can merge the changes with the existing record, or simply delete the old and insert the new version of the record.

    I realize you were asked to do this without getting your DBA involved but quite frankly I believe s/he needs to be involved. Someone has to make a decision with how you proceed, and it ultimately should have their input in some form fashion or the other.

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • I applaud your sense of responsibility.

    I will present the Old and main table approach again and offer the view for reporting. Perhaps the view will buy the approval.

    I will visit a few more DBAs to see if I can stir them up and have them stand by me.

    Greatly appreciated your feedback!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply