Proper Storage and Handling of Personal Names

  • Stephanie Giovannini (8/4/2016)


    I would settle for having my last name properly spelled. There's some low-hanging fruit.

    +1

  • Stephanie Giovannini (8/4/2016)


    I would settle for having my last name properly spelled. There's some low-hanging fruit.

    😀

  • Nice article. Eye opening for me as I always do US centric work and first/middle/last is the common pattern (in names and in the db). Correct spelling of names is good, as is knowing how to pronounce them, and fuzzy matching to identify duplicates.

  • ScottPletcher (8/4/2016)


    Robert Domitz (8/4/2016)

    3. Is "von Helsing" the same as "VONHELSING" or just "HELSING"? In the latter case, the originating system sliced off the "von" as a middle name!

    The answer to all of these examples is "Probably."

    For that specific case, almost certainly not. In German, in particular, the "von" prefix was a status marking, a "nobility" prefix, of a sort. Iow, the dude's name was just "Helsing", but "von" was added before it as a sign of nobility/influence/prestige. Thus, "Helsing" and "Vonhelsing" are absolutely not similar names.

    As a person's name, you are correct.

    But as a name in a data stream, you do not know. Note that I stated that the originating system had incorrectly considered the "von" as a middle name, not as part of the surname.

  • roger.plowman (8/4/2016)


    Mononymics are fairly straightforward as far as I'm concerned.

    The mononymic would go in the given/first name and "N/A" would go in the family name.

    Also, nulls are abominations, since you need to know if a name was not entered yet (TBD), not applicable (N/A) or verified as unknown/unknowable (UNK). The last case, for instance, in "John Doe" type situations, where the person can't be asked and no one else knows--or can ever find out (a deceased and unidentified person, for example). Missing middle names can use "N/A" or "UNK" if not applicable/unknown/none of the database's business. 😛

    As for the poor Hungarian poster, my condolences! In that case you'd almost certainly need a "mode" field of some kind, to indicate what kind of name you're dealing with.

    Another case I didn't see mentioned was "called-by". People named Robert, for instance might be called "Bob", "Rob", "Robert", etc. In my company that's actually a problem so I have to make allowances for it. Our company president is pretty insistant that called-by names be used for email, for instance, and on internal reports.

    Now I'm thinking about how you could build a Name table to handle all cases. (laughing).

    And you can have a Greeting Name column for how the person would like to be addressed (you can even break this up into formally and informally). It makes sense to me that one name is the first name. It cant be the second nor the last.

    As far as other items like d'angelo or dangelo. This is no different than what you experience with other names ("IBM" = "I.B.M." ?). You create and apply rules within the shop to increase data consistency.

    ----------------------------------------------------

  • roger.plowman (8/4/2016)


    Mononymics are fairly straightforward as far as I'm concerned.

    The mononymic would go in the given/first name and "N/A" would go in the family name.

    Also, nulls are abominations, since you need to know if a name was not entered yet (TBD), not applicable (N/A) or verified as unknown/unknowable (UNK). The last case, for instance, in "John Doe" type situations, where the person can't be asked and no one else knows--or can ever find out (a deceased and unidentified person, for example). Missing middle names can use "N/A" or "UNK" if not applicable/unknown/none of the database's business. 😛

    As for the poor Hungarian poster, my condolences! In that case you'd almost certainly need a "mode" field of some kind, to indicate what kind of name you're dealing with.

    Another case I didn't see mentioned was "called-by". People named Robert, for instance might be called "Bob", "Rob", "Robert", etc. In my company that's actually a problem so I have to make allowances for it. Our company president is pretty insistant that called-by names be used for email, for instance, and on internal reports.

    Now I'm thinking about how you could build a Name table to handle all cases. (laughing).

    In this situation, I think in most systems, you would likely see letters addressed as "Cher N/A" or "Sting N/A", and such. The programming and formatting side of things must also be more aware of certain exceptions. And I agree that NULLs can be an issue in that they can mean too many different things.

  • tung! (8/30/2016)


    roger.plowman (8/4/2016)


    Mononymics are fairly straightforward as far as I'm concerned.

    The mononymic would go in the given/first name and "N/A" would go in the family name.

    Also, nulls are abominations, since you need to know if a name was not entered yet (TBD), not applicable (N/A) or verified as unknown/unknowable (UNK). The last case, for instance, in "John Doe" type situations, where the person can't be asked and no one else knows--or can ever find out (a deceased and unidentified person, for example). Missing middle names can use "N/A" or "UNK" if not applicable/unknown/none of the database's business. 😛

    As for the poor Hungarian poster, my condolences! In that case you'd almost certainly need a "mode" field of some kind, to indicate what kind of name you're dealing with.

    Another case I didn't see mentioned was "called-by". People named Robert, for instance might be called "Bob", "Rob", "Robert", etc. In my company that's actually a problem so I have to make allowances for it. Our company president is pretty insistant that called-by names be used for email, for instance, and on internal reports.

    Now I'm thinking about how you could build a Name table to handle all cases. (laughing).

    In this situation, I think in most systems, you would likely see letters addressed as "Cher N/A" or "Sting N/A", and such. The programming and formatting side of things must also be more aware of certain exceptions. And I agree that NULLs can be an issue in that they can mean too many different things.

    I think that the testing phase would easily catch these occurrences of "Sting N/A" and such. This is not difficult to address on the reporting level. 😛

    ----------------------------------------------------

  • Tom Gillies (8/4/2016)


    That's definitely an interesting and useful suggestion. Tung has pointed out a real problem. The difficulty with dealing with it in the abstract is that different solutions work in different situations. The "formal" and "informal" name approach would work very well in many situations and across cultures. I'm not sure it would necessarily give you the "grouping" that you get with Western family names but that may not matter.

    I've been thinking about this problem, and I would start with a number of questions: 1) What is the geographic/cultural scope of the proposed system/database? - If we are confining ourselves to one culture, then our job is simpler. 2) How different are the various cultures concerned? 3) What are the numbers/proportions involved? 4) How much does it matter to the people/business involved? - Might it cause grave offence or do they not care. 5) Is there a political aspect to this? - "Colonial legacy" or minority ethnic groups 6) Are there features of names that we want to use that we haven't been thinking about - Like grouping with family names. It goes on and on...

    ...the thing is, this can either be important, or it can be a complete waste of time. We need to decide if it matters, how much it matters and how much effort we want to put into it. We can only really do that for a particular case. I'm not sure if it is possible to come up with a implementable "general" solution.

    Wise thoughts.

  • tung! (8/4/2016)


    Why must we split names into parts at all? The way I see it, it may be better to simply request a full name, and a short name that we would use to address the person.

    One reason is for sorting purposes. Generally, sorting is done by last name. That would be harder to derive.

    One way to get around this would be to include a sort name as well as a full name, but that's extra typing.

Viewing 9 posts - 31 through 38 (of 38 total)

You must be logged in to reply to this topic. Login to reply