Help Required in Fact Table Design

  • We have certain data in Relational format. We are planning to build a DW on that.

    We have table like

    Product - has information on Product Details.

    structure: Prodid,name

    Region - has information on Region details

    structure: Regionid, Regionname

    TestStatus - has information on various test steps a product has undergone.

    structure: TestId, Prodid, Stepname, time

    1 P001 Step1 sept 12,2009 10:01:21 R1

    2 P001 Step2 sept 12,2009 10:01:22 R1

    3 P001 Fail sept 12,2009 10:01:24 R1

    4 P002 Step1 sept 12,2009 10:01:26 R3

    6 P002 Fail sept 12,2009 10:01:31 R3

    7 P003 Step1 sept 12,2009 10:01:41 R2

    8 P003 Step2 sept 12,2009 10:01:44 R2

    9 P003 Pass sept 12,2009 10:01:46 R2

    We would analyse time taken between each steps for particular product, we would analyse time taken for paritcular product to fail (for eg:P001, 4 sec (record 3-record1)

    We need help in creating fact tables for this.We have created Dim table on Product, Region and Step

    Thanks in Advance

  • Firstly you do not need to make a massive change. You need to think more about how the data is to be used. If you are going to use Analysis Services which I assume you are then I would use views over the relational for quick development work but for the real warehouse I would use an ETL tool to import the data. The reason I use views to do development is that you can quickly change the data that is being presented to the cubes.

    I would expect the table layout to appear as below

    Table: whs_dim_product

    whs_prodid (ProdID)

    whs_prodname (name)

    whs_prodtitle (name + ‘-‘ + ProdID)

    Tabble: whs_dim_region

    whs_regionid (regionid)

    whs_regionname (regionname)

    whs_regiontitle (regionname + ‘ – ‘ + regionid)

    Table: whs_fact_teststatus

    whs_testid (testid)

    whs_prodid (prodid)

    whs_stepname (stepname)

    whs_test_date (derived date only form time)

    whs_test_time (derived time only from time)

    whs_test_step_time (time between this step and previous for particular product)

    In the fact table I have created columns for the date of the test if you wanted to analyse test results by day. By placing the calculation for the step time (whs_test_step_time) into the underlying database you will not need to do the arithmetic in the cube or reporting services. Doing all calculations in the data warehouse will speed up users reports.

    Hope this helps.

    Ells.

    P.S make sure you join the tables on integers if possible. You have Step in the fact table as a description. I would replace it with the number and have a dimension table for the steps where the join would be integer to integer.

    😀

  • It was really helpful post, thanks actually we have decided to go with having additional columns that would have values on time taken for each step. By having time taken for each step, How do i generate a report that would say how much time take nby a product to fail... for example when the product goes through multiple steps and at nth step if it fails, time taken from 1 to nth step would be the time taken for failiure.. When i generate a report that shows the time take nto fail, this desing would help rite.. Can you give some inputs

  • It was really helpful post, thanks actually we have decided to go with having additional columns that would have values on time taken for each step. By having time taken for each step, How do i generate a report that would say how much time take nby a product to fail... for example when the product goes through multiple steps and at nth step if it fails, time taken from 1 to nth step would be the time taken for failiure.. When i generate a report that shows the time take nto fail, this desing would help rite.. Can you give some inputs

    There are several ways of handling this. It depends what you really trying to report on.

    1. Add into the fact table I detailed a column whs_test_failure and whs_failure_time. Whs_test failure will be 0(zero) by default and changed to 1 if the product failed at this step. The column whs_failure_time will be calculated by adding time from first step to the failure step.

    Table: whs_fact_teststatus

    whs_testid (testid)

    whs_prodid (prodid)

    whs_stepname (stepname)

    whs_test_date (derived date only form time)

    whs_test_time (derived time only from time)

    whs_test_step_time (time between this step and previous for particular product)

    whs_test_failure ( 1 if failed at this step, 0 if passed this step)

    whs_failure_time (sum of time taken in this and all previous steps)

    2. The alternative is to create a fact table that just contains the individual steps where failures occur. This will then just feed a report or cube that shows failures only. You will still need to calculate the time.

    If you use 1 then you add a dimension teststep which will have two member 0 pass and 1 Fail. Then in the cubes you select that dimension, choose Fails and you can retrieve the time and step info.

    Regards,

    Mark.

  • Thanks a lot for the information. We have designed the fact table as it will have following columns

    Prodid

    RegionId

    StepId

    StepName

    PreviousStepName

    TimeInfo (this will have time taken from prev step to current step)

    CurrentTime

    Hope by this means we can generate reports like How much time spent by the products in each step, time taken to fail etc.

    Hope it makes sense

  • Prodid

    RegionId

    StepId

    StepName

    PreviousStepName

    TimeInfo (this will have time taken from prev step to current step)

    CurrentTime

    The design should reflect what you want to see as a result. The table above does not quickly convey the time to failure or the step that caused the failure.

    I am assuming that if it failed on step 4 then you are going to calculate failure time as

    (time for step1 + time for step2 + time for step3). I am assuming you identify failure in this case as there is no step PASS.

    If identifying failures and time to failure is very important then I would add them as columns. If I were to build an OLAP cube over the top of the data I would put these into the DW table as calculations executed in cubes via MDX will slow sown reporting from the cube.

    If all of the steps are

    1 step1

    2 step2

    3 step3

    4 step4

    5 PASS

    And you always have a set number of tests then it is ideal to move to a dimension and just leave the key in the fact.

    Regards.

    Mark.

  • Actually the stepname column will have the value "fail" when it has failed... So we didn't actually create additional column for the same.

    Actually we got this link that helped our design

    http://asmdx.blogspot.com/2008/07/fact-table-design-for-state-workflow.html

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply