Slowly Changing Dimensions - recommended approach?

  • Hello

    Can anybody recommend a best practice approach with the following scenario?

    I have a fact table of order lines (FactPolicyTransaction)

    I have a dimension with products (DimProduct)

    The fact holds a Surrogate Key for DimProduct

    I want to hold current and historic changes to Products (SCD 2 method)

    In my Fact, I want to refer to the current structure of the product i.e. always use the latest structure in reports or cubes

    I might also require history at some point

    So, I'm thinking of making DimProduct SCD 2 when building using SSIS

    How should I handle the Fact though as SCD 2 creates a new Product record with a new SK every tome a change is made

    Do I have to keep rebuilding the Fact?

    Would it be sensible to (something like) take a copy of the Product and insert this as the new record

    Then overwrite the existing whilst retaining the SK

    That way the SK in the Fact is consistent, always refers to the latest state and should never need to be updated

    Or should I be looking to take another approach?

    Thanks

    - Damian

  • There's a few different approaches you could take here, and it really just comes down to preference.

    The one constant thing is that you will need a type 2 SCD, to track the Product dimension changes. Now in terms of how you'd like to reference both current and historic versions from the fact table is where the preference comes into play.

    It would be possible to have one Product dimension (type 2), and store its key in the fact table. You should be able to implement some kind of versioning and create a self-joining view of the Product dimension, in which every version of the product is displayed alongside the most recent (or current) version. (Hope this makes sense...)

    My preference is to create 2 dimensions: A Product dimension which contains only current versions, and a Product History dimension which contains all the type 2 records (including the current version). I like this approach because it eliminates the need for views (I don't like views very much), and you will now have 2 foreign keys in your fact table (one for each dimension). I also find this approach easy to explain to end users, and simplistic enough for them to understand and use... even in an ad hoc fashion.

    With any type 2 SCD, the key in your fact table should reference the version of that entity (Product in this case) at the time of the transaction. You should never have to rebuild your fact table, unless you change or rebuild your type 2 dimension.

    Hope this helps.

  • So, in this instance, create 2 versions of the product Dim (say DimProduct and DimProductHistory)

    DimProduct would really be SCD1 and be referenced by my Fact

    DimProductHistory would be SCD2 and used if I want history

    If I do, I reference the SK for the product at that point in time

    I then create 2 versions of the insert within the SSIS

    Does that seem like a good approach?

    Thanks

    - Damian

  • Yes, on all accounts.

    In terms of your last comment: I typically process the type 2 (history) dimension first, and then perform a merge on the current version records with the "Current" dimension.

  • Thanks Martin, I will give this a go

    Damian.

    - Damian

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply