Normalization question...

  • Hi,

    I have a table of entities.  Each of which has a unique ID.  For a certain subset of those entities, a particular set of values can apply to a pair of entities.

    Let's call the entity ID's E1 and E2 and the associated values x, y and z.

    E1 and E2 are GUID's and x, y and z are floats.

    My temptation is to create a table that has E1 and E2 as the primary key and have columns for x, y and z.  However, since E1 and E2 both represent an ID column that relate to the same table, I feel I am breaking the first normal form.

    Essentially what needs to happen is that a pair of entity ID's needs to be related to x, y and z values and I want the table normalized.  Any advice would be appreciated.  Thanks in advance.

    -Pete

     

  • If you want it normalized you would probably break out the x, y, z values into a separate table with an identifier and add a junction table to link the two.

    If there will never be more than 2 values in the set it would be more efficient to just leave it all together in one table though.

     

  • Not sure if I've read you right but my understanding is that you have the following:

    ID::ATA1::ATA2:::X:::Y:::Z

    And your thinking of implementing the following:

    ID::ATA1::ATA2

    LinkID1:::LinkID2:::X:::Y:::Z

    So that values X,Y,Z can relate to two different rows in the first table.

    What you probably need is the following:

    ID:::LinkID::ATA1::ATA2

    ID:::X:::Y:::Z

    Since these values only relate to a subset of data, I guess you'll have alot of blank LinkIDs but this is the correct implementation (If I've understood you correctly). Its possible that your normalizing the data too much and just need to leave it as one table.  Just use your best judgement.

  • Bloody Grinning Faces!

    Note to self: preview message before posting.

  • I do not interpret the question in that way. As I understand it there is a table T1 { e, c0...cn } where e and c0...cn is a number of columns that depend on e and nothing but e. So T1 is fully normalized. However, now comes a requirement that for two rows R1 and R2 of T1 there is a set of values (x, y, z) that should be equal. Also, not all rows in T1 is a member of a pair that needs the x, y, z values.

    The solution that is thought about implementing is to create a new table T2 { e1, e2, x, y, z } where e1 and e2 together form the primary key and where they each refer as a foreign key to T1.e. I don't see anything wrong with this design, just make sure to have a check constraint so that e1 =! e2 and it should satisfy the requirements.

    Note: As I understand it x, y & z are not part of c0...cn today, but even if they are I would break them out into T2 as shown above.

  • Hi all,

    Thanks for the replies. Let me try and put this into context.  In my application, I want to look up an entity.  That entity has certain attributes

    which are returned such as its name.  Another type of information is its interaction parameters, how it interacts with other entities.  This is the

    data that I am trying to store. 

    So for a entity 1, I would return its name.  Then I would want to return all the ID's of the other entities it interacts with as well as the x, y and z parameters associated with each particular pair/interaction (so potentially, entity 1 & entity 2 and their associated x, y and z, entity 1 & entity 5 and their associated x, y and z....). 

    In the application I will have the ID for entity 1,  what I need to get is all entity ID's that have an interaction with entity 1 and the associated x,

    y and z parameters for that interaction. 

    I think you are correct in saying the key in my table Params looks suspect.  I agree. Since the x, y and z parameters would be the same for the pairs entity1 & entity 2 and entity 2 & entity 1.

    Here is a sample the table I was tempted to make (I've substituted ints for unique identifiers for convenience in this example):

    E1    E2       x      y      z

    ---    ---      --     --     --

    1      5        0.1  0.2    0.3

    1      6        0.4  0.5    0.6

    9      1        0.7  0.8    0.9

    However since E1 and E2 both contain ID's for the same entity table, am I breaking the first normal form?

    Does this help to clarify?

  • As I see it, no, it does not break 1st (or any) normal form. The fact that they contain foreign keys to other tables, or in this case the same table twice, have no impact on normalization. In fact, you are not modelling the same entitiy as the one in the referenced table, you are actually modelling something else here. This someting else has a key (E1, E2) and x, y & z all depend on that key, the entire key and nothing but it. So in my view this is a good design.

  • Thanks Chris.  I appreciate the feedback.  It does make more sense (in terms of retrieval) to store it in the manner described.  I'm glad to hear it qualifies as a normalized table.

    -Pete

  • Thanks Chris.  I appreciate the feedback.  It does make more sense (in terms of retrieval) to store it in the manner described.  I'm glad to hear it qualifies as a normalized table.

    -Pete

  • Agree with Chris

    If your data was in two tables then the joining table would not be an issue the only peculiarity is that the joining (interaction) table references one table instead of two

    However with this design there is the issue of duplicate data as in

    E1    E2    x     y     z

    ---   ---   --    --    --

    1     5     0.1   0.2   0.3

    5     1     0.1   0.2   0.3

    which you can either duplicate the data or have one single row and match ID to either column

    Another possible solution is to create the interaction table thus

    Table I1

    ID    x     y     z

    ---   ---   ---   ---

    1     0.1   0.2   0.3

    2     0.4   0.5   0.6

    And create the joining table thus

    Table J1

    I1_ID   T1_ID

    -----   -----

    1       1

    1       5

    2       6

    2       9

    and to retrieve the data

    SELECT c.ID, b.x, b.y, b.z

    FROM J1 a

    INNER JOIN J1 b ON b.I1_ID = a.I1_ID

    INNER JOIN T1 c ON c.ID = b.T1_ID

    WHERE a.T1_ID = @ID

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Hi David,

    I understand what you're saying.  If I had table:

    E1    E2       x      y      z

    ---    ---      --     --     --

    1      5        0.1  0.2    0.3

    1      6        0.4  0.5    0.6

    9      1        0.7  0.8    0.9

    and I split it into your structure, it would look like this:

    Table I1

    ID    x     y     z

    ---   ---   ---   ---

    1     0.1   0.2   0.3

    2     0.4   0.5   0.6

    3     0.7   0.8   0.9

    Table J1

    I1_ID   T1_ID

    -----   -----

    1         1

    1         5

    2         1

    2         6

    3         9

    3         1

    To retrieve the information I would have to do the following to get the information I needed:

    SELECT b.T1_ID, c.x, c.y, c.z

    FROM J1 a

    INNER JOIN J1 b ON b.I1_ID = a.I1_ID

    INNER JOIN T1 c ON c.ID = b.T1_ID

    WHERE a.T1_ID = @ID AND a.T1_ID ! b.T1_ID

    While I agree that your table structure is definitely normalized, for my needs it is easier to retrieve the data I need using the first structure.  As long as I can justify that as being normalized it should be fine. 

    Since you mention the possibility of duplicate data in the first structure.  My thought would be to check for the existence of a duplicate in the insert & update stored procedures that modify the table.  Do you have any thoughts on that?

     

  • If you enforce the non duplication then I see no problem with that method, i.e.

    E1   E2   x     y     z

    --   --   ---   ---   ---

    1    5    0.1   0.2   0.3

    or

    E1   E2   x     y     z

    --   --   ---   ---   ---

    5    1    0.1   0.2   0.3

    then your query may look like

    SELECT t.ID, t.col1, i.x, i.y, i.z

    FROM [interaction] i

    INNER JOIN

    t ON t1.ID = i.E1 OR t1.ID = i.E2

    WHERE i.E1 = @ID OR i.E2 = @ID

    depending on your output requirements

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Great!  Thanks for the help.

  • I think the simplest solution is to change the check constraint that I mentioned to say E1 < E2. That way you avoid duplicates and it of course also includes the old check that it refers to different rows in the other table.

  • If I make that change, I would then need to make sure that I always entered the data in that form.

    So I would never have an entry:

    E1    E2    x    y    z

    --    --   --   --   --

    9     1    0.1  0.2  0.3

    It would need to entered as:

    E1    E2    x    y    z

    --    --   --   --   --

    1     9    0.1  0.2  0.3

    Is this how you see your constraint working?

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply