Maintaining Uniqueness

  •  

    We're going to be re-working our back-end system in which we store our customer and their historical information.

    The flaw that we've had for years is that the data is easily corrupted due to our processes.  We used Telephone numbers as the Primary ID and while for our industry it works "good enough" at the end of that day that just simply doesn't cut it.

    Thus, I'm looking to try to strategize a sytem in which we can reasonably assume that an entry is unique across our company.

    We have two sections (Telemarketing and Direct Mail).

    So far, they have always been seperate, so one of the pieces of this project is to look into ways to tie them together.

    For our telemarketing clients it's reasonable to assume we'll have the following:

    Phone Number, FirstName,LastName,Address,Zip

    The only "issue" with these we have to solve is linking up Husbands and Wives at home,etc.  Thus my plan was to simply add a column for CustomerID in and just have that be my primary key and allow business rules to handle the rest of the logic required.

    However, when I pull something like integrating direct mailers into this, I'm going to end up losing the phone number, but having the rest of the information.

    In my mind, the logic still doesn't change, I just need to do a bit more work in the business layer in order to make sure that I'm staying unique.  However my concerns are when people mis-key information and/or shorten names,etc.  IE Bob=Robert or Jimmy=Jimy which would obviously fail updates that link on exact entries.

    Another example might be having a boyfriend / girlfriend (Bob Smith / Mary Jones) who get married and a new record comes into the database with the same address,etc but has Mary Smith attached as the name.  Do companies atypically pay someone $6 an hour to filter through this, or are there tools / methods that can be used to help make these decisions already available?

    I'm sure many companies have to work on resolving this exact type of problem and I'm wondering what kind of experiences others have had in solving this issue.

    TIA

  • Hi ,

    can you send your current db structure ?

    Thanks

    Raj

  • It seems you have a few issues.

    First, I would definitely have some type of CustomerID, probably a separate table that has basic information about what a "customer" is and have a unique, surrogate key. This way you are sure of who you have.

    I'd like this table to your other tables, maybe with an in-between table that matches up CustomerID with some other key in those tables. You might have two of these if you have two places to match to. Without knowing more about the business, it's hard to say if this is a good idea or not, but it's an option. Queries do get more complex, but you can maintain some of what you have already for the telemarketing v direct mail sides.

    In terms of data quality, this is always an issue. It helps to have routines to cleanse this data, and while you can buy some solutions, mostly this is a case of building up things as you need them. Standardize on "Ave" or "Avenue" or "Ave." and then write routines that clean up sections of data as it's loaded. Or that check new data. Having a "last update date" column helps here so you can batch fix lots of data with a few passes through the data.

    Ultimately you are moving to a data warehouse stage. Let people enter stuff in an OLTP system(s) and then you quickly grab it, clean it, load it to a warehouse where it can be used for reports or other work. This is what ETL does.

  • I wouldn't call them "customers" but "victims".

    I'm tempted to say: "just ask everybody for her/his SSN and use it as PK" but I'll behave well and I'll refrain for saying it 😀

    Now, seriously - since the "Customers" better saying "Victims" most probably don't want to have the privilege of being part of such database you cannot expect collaboration on their part - I'll use a combination of phone number, zipcode and first/last name to clean up as much as possible.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Would the OP not have already found his answer after 3 years, or am I missing something? 😉

  • I've seen cases like this where you didn't have good PKs, but it worked because you limited the scope of the business. Change that, and all of a sudden things start to become an issue. I think that's what's happened here, and as they try to merge these systems, it's hard to determine which way is best.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply