Char Datatype Match

  • I have 2 tables Members and Licensed - the Member table has a char field Member_name and the Licensed table has a field licensee_name. Need to be able to select rows from table Member where Member.Member_name matches with Licensed.licensee_name

    However, there are cases where the middle name could be included partially or fully in one and avoided in the other or any other combination and also one table could hold the credentials like 'MD' and suffix like 'Jr' and the other may not - I need to be able to select all that match just on the first and last name.

    For eg. Licensed.licensee_name = Thomas N Machado should be a match against Member.Member_name Thomas Neil Machado Jr MD

    Please help.

    Thank you

  • This design is flawed, you must assign a primary key to each table and join using those keys, this problem can no be solved (easyly) with TSQL. You'll have to repair the data first by finding which rows cannot be matched in the licensed table to the members table. then you must add a key to the table (or find it). Then you'll be able to do the join without problem.

  • Am aware of the design concerns - The license table is coming from an external source - am trying to get a list of members in my database that are also in the license table. The members table does have a member id. This is not a design issue - only a way around to identify matches, if possible.

    Thanks

  • You could always try to do an inner join with the soundex() function.

    Something like this :

    Select soundex('Paris Hilton'), soundex('Paris jr Hilton') where soundex('Paris Hilton') like soundex('Paris jr Hilton')

    But I'm sure that this will cause a few rows to be repeated in the select (not sure how easy it would be to use the distinct clause on this one).

    Also keep in mind that this will most likely force index scans, so you can forget speed if you have large tables. There's always a but... but I don't see one in this case with the information you supplied.

  • I don't think soundex will work - if the first names are the same but last names are different soundex will still consider it a match, am i right?

    Can some function like CHARINDEX be used??

  • How can it be the same person if they don't have the same last name??

    Anyways, here's more info on the soundex. It looks like it stops analysing at the space character, so if you use soundex(Replace(Field, ' ', '')) I think it might work.

    --works

    Select soundex('ParisHilton'), soundex('ParisRitchie')

    --P624 P626

    --fail

    Select soundex('Paris Hilton'), soundex('Paris Ritchie')

    --P620 P620

  • I would suggest the DIFFERENCE( character_expression, character_expression) over SOUNDEX.  It actually uses SOUNDEX, but it is more simple.  I would suggest building a cross-reference table using a match of only 3 or 4 and then refining that...

    SELECT DIFFERENCE( 'Paris Hilton', 'Paris Hilton') -- 4

    SELECT DIFFERENCE( 'Paris Hilton', 'Paris jr. Hilton') -- 4

    SELECT DIFFERENCE( 'P Hilton', 'Paris jr. Hilton') -- 3

    SELECT DIFFERENCE( 'Deloris Hilton', 'Paris jr. Hilton') -- 2

     

    I wasn't born stupid - I had to study.

  • Nice touch... didn't know that function .

    In the lights of this new knowledge, I would definitly go with that option... Still assuming that the design cannot be beautified.

  • Heheeee.  I just learned that function yesterday myself!  SOUNDEX is a booger to use...

    I wasn't born stupid - I had to study.

  • That's the whole point of the community .

  • The difference function uses soundex under the hood. It just gives you the degree of match be a value between 0 and 4 where 4 is a perfect macth (both variables have the same soundex code). The soundex code was developed to aid in the search of ceusus records where it is common for names to be misspelled. I am not sure if soundex will work for languages other than english.

    Soundex code is derived by removing all vowels, double consonants, the letters H W and Y, spaces, and ignores capitalization it then builds a four character string by using the first letter of the string as the initial letter of the soundex string. It then takes the next three remaining consonants and codes them using this table:

    B F P V = 1

    C G J K Q S X Z = 2

    D T =3

    L = 4

    M N =5

    R = 6

    0 is used as a filler if the converted string is less than four characters long

     

    The name du boys, duboys, du Boise,Du bois, du Bois (correct),Du Bois (which I use because my great grandmother changed the spelling as she finally got tired of fighting with first grade teachers who were telling her childern that they were mispelling their last name which MUST start with a capital letter) all have a soudex of D120.

     

    The lovely state of Florida once lost all of my motor vechile records because of a misspelled name. It took me months to get that mess cleared up. If your are working for someone who cares about customer satisfaction using soundex or difference may save a few customers. This of course does not apply to goverment organizations as one official told me that if I did not like the way he ran things I could just move out of the state.

    {edited added)

    If you are going to use soundex or difference I would suggest having the client create the soundex and then pass this variable to a sp where it can be used in a dynamic sql statement

     

    Mike

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply