Many-to-Many relationships with foreign key constraints?

  • Shaun McGuile (7/18/2008)


    Have you seen this?

    http://en.wikipedia.org/wiki/Slowly_changing_dimension%5B/quote%5D

    Hmm, if this is the current state of the art, maybe I'm onto something. I have implemented versioning as just another normalization step I call Version Normal Form (VNF). I have cleaned up my paper a bit, but it is still incomplete. However the incomplete areas are the details of the views and triggers on the views that hide the implementation details from the users. There should be enough info for most developers to go ahead and do that on their own and I'm happy to answer questions. I will be updating the paper fairly regularly, especially if vnf catches on, so be sure to check back for updates.

    Check back where? Oh, yeah. I have created a Google group called vrdbms (Versional Relational DBMS) and uploaded the OpenOffice and Word versions of the document. If I created the group the way I think I did (this is my first time) then it is open to the public and anyone can join and download the paper. I fervently request that anyone working with versioning post any contributions to the group so we all can benefit.

    http://groups.google.com/group/vrdbms?hl=en

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • I'm just curious Brandie, why do you think that you need the Foreign Key in your Data Warehouse?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • You mean besides referential integrity between the Fact table and the Dimension tables?

    My setup includes 3 star schemas that overlap each other on one of the dimensions. I'd like to make sure that an orphaned entry can't make it into our datawarehouse. That says referential integrity to me.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Frankly, I am not much of believer in RI in a datawarehouse. I know that many people swear by it, but since there aren't any interactive users with write access, nor any application code the ROI on it seems to me to be close to zero, possibly even negative.

    Given that the only writer in a DW is the ETL process, that means that all[/a] referential integrity can be implemented by the ETL process or the staging database and usually more efficiently and effectively.

    And as pointless as declarative RI on a DW might be, non-declarative RI (i.e. a trigger) is doubly so. Think about it: you are writing code in an inconvenient context to make sure that the one piece of code that can write to that table (which you also wrote and/or control) doesn't make a mistake. Then you have to go and add all kinds of code to the ETL process to manage the execeptions that the RI code in the DW might throw. It is far more efficient to just write the ETL code to do these checks itself and manage the data accordingly.

    Just MHO...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • And yet a poorly written ETL process could inadvertantly orphan records. Since I might not be the only person writing ETL processes (we're starting out with two DB sources, but will soon graduate up to 7), I have to have some way of making sure any other ETL designers don't flub up the data and leave something orphaned.

    If I were the only one controling the DW, that would be another matter completely. Unfortunately, I don't live in Perfect, like some Walgreen's customers do. @=)

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (7/21/2008)


    And yet a poorly written ETL process could inadvertantly orphan records.

    Well, that's an argument for code reviews & good testing, not Triggers in the DW, remember, it is just as easy to write a bad trigger as a bad ETL procedure. And this possibility is the reason why I check the validity of the Load in the Staging database before the Load begins.

    After all, what do you do if you get 2/3rds of your DW tables loaded and a trigger or FK throws an exception? If you stop now, you'll have more Orphans and RI issues than a bad ETL procedure could ever make. If you just ignore it/override it, then why was it there in the first place?

    What you would have to do is to rollback the Load, but that is a lot easier said than done. Typically, a DW Load is way too big to wrap a transaction around (never mind the multi-database issues). So you would have to have written an automatic Unload procedure and in my experiance, most folks do not have that in their project plan. So that leaves you with: {Stop the Load in-progress, send an alert to the operator, have the operator restore the DW and the Staging DB from the backup that you took right before the ETL started}, which works but is kind of brutal.

    However, if you are doing the validation in the ETL proc/Staging database, pre-Load, you just don't do the load.

    Since I might not be the only person writing ETL processes (we're starting out with two DB sources, but will soon graduate up to 7), I have to have some way of making sure any other ETL designers don't flub up the data and leave something orphaned.

    Well, the number of sources might affect the number of Extract procedures, they should not affect the number of Load procedures that touch a DW table.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I totally agree with darth barry!

    If its bad don't load it!

    --Shaun

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • OK, I'm really intimidated my what everyone else has written so far, but here goes anyway.

    If you're re-assessing the data structure then is this not, as Tomm Carr alludes to in his second versioning post, simply a question of ordinary normalisation?

    Taking your example of changing the address for John Doe, instead of having:

    TableA: [Col1, Col2]PK, Name, Address

    TableB: [Col1]FKDWG (Foreign Key Don't Work Grrr) 😉

    can you not take Col2 and Address out into TableC?:

    TableA: [Col1]PK, Name

    TableC: [Col2]PK, [Col1]FK, Address

    TableB: [Col1]FKNWY (Foreigh Key Now Works Yay)

    and then you CAN link TableB.Col1 FK to TableA.Col1 PK.

    To paraphrase Groucho Marx, I'm only a simple country DBA with horse-sense, so if anyone can tell me where I'm going wrong I'd be gald to hear it.

    Tom

  • thood (7/25/2008)


    OK, I'm really intimidated my what everyone else has written so far, but here goes anyway.

    If you're re-assessing the data structure then is this not, as Tomm Carr alludes to in his second versioning post, simply a question of ordinary normalisation?

    Taking your example of changing the address for John Doe, instead of having:

    TableA: [Col1, Col2]PK, Name, Address

    TableB: [Col1]FKDWG (Foreign Key Don't Work Grrr) 😉

    can you not take Col2 and Address out into TableC?:

    TableA: [Col1]PK, Name

    TableC: [Col2]PK, [Col1]FK, Address

    TableB: [Col1]FKNWY (Foreigh Key Now Works Yay)

    and then you CAN link TableB.Col1 FK to TableA.Col1 PK.

    To paraphrase Groucho Marx, I'm only a simple country DBA with horse-sense, so if anyone can tell me where I'm going wrong I'd be gald to hear it.

    Tom

    Tom that would work,but...

    Data Warehouse model = denormalized data for speed.

    Your solution is normalized.

    --Shaun

    Hiding under a desk from SSIS Implemenation Work :crazy:

  • Shaun McGuile (7/25/2008)


    Tom that would work,but...

    Data Warehouse model = denormalized data for speed.

    Your solution is normalized.

    So denormalize it... in the warehouse.

    In production use, it is normalized and thus versioning presents no more trouble than any other normalization form. My tests show that the vnf performance hit ranges from about 10% in large volume operations like a complete table dump (select * from table/view) to unnoticeable in normal CRUD operation -- about what it costs to implement a many-to-many relationship.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • I'm actually playing with the VNF right now with my DW solution. I've just started altering the schema and doing the coding. I'll keep everyone updated as to whether or not this works out for my situation.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply