Maybe I''m missing something...

  • Maybe I'm missing something obvious, but how do I deal with the following:

    I have a datamart made up of transaction data. It follows a fairly common format in that there's a Transaction Header with data about the entire Transaction such as sale location, operator, etc. Then there's a Transaction Line level, with information about which product packages are in the transaction and their quantities (1 to many Transaction Lines per Transaction Header). Then there's a Transaction Line Detail level, which contains the individual components that make up a product at the Transaction Line, and their individual prices (1 to many Transaction Line Detail rows per Transaction Line).

    So, when I take these 3 tables and denormalize them into a data mart, the measures of quantity (from transaction line) and price (from transaction line detail) become repeated. So if I say "what's the sum of quantity of all transactions?", I get the quantity multiplied by the number of detail rows.

    To further complicate things, there may be more than one discount or tax, which also creates duplicate rows.

    So, how have people dealt with this, as I'm sure I'm not the first person to come across this? I thought about dividing the quantity and price by the number of rows during the ETL process, but that seems like a lot of work. I was also wondering if there was a way to just magically take care of this in Analysis Services, but a) couldn't see anything obvious, and b) then we're committed to never doing anything (like reports) from anything but the SSAS cubes.

    Thanks in advance for your time.


    Rick Todd

  • Hi Rick

    I think you have to decide your granularity. There's no reason why you can't manually insert a 1 on each row in the detail fact table, allowing your users/reports developers to sum the ones to derive a quantity.

    Alternatively you can provide them with an aggregated fact to assist with performance where your ETL will have rolled up to transaction line level by summing the details from the detail fact table.

    Kindest Regards,

    Frank Bazan

  • So would that be something done on the T-SQL side, or the SSAS side? I've been hoping all this time that there would be a way to make SSAS smart enough to know that the Quantity, GrossSalesAmount, whatever should only be aggregated to the correct level based on how it's being utilized.

    So here's a very simple version of the model on the operational system side:

    OrderHeader:

    OrderID, (PK)

    SaleLocation,

    SalesPerson

    OrderLine:

    OrderID, (PK)

    OrderLine, (PK)

    ProductPackage,

    Quantity

    OrderLineDetail:

    OrderID, (PK)

    OrderLine, (PK)

    OrderDetailLine, (PK)

    ProductComponent,

    GrossSaleAmount

    OrderPayment:

    OrderID, (PK)

    OrderPaymentLine, (PK)

    PaymentType,

    PaymentAmount

    And here's the simplified version of the model in the Data Mart:

    OrderID,

    OrderLine,

    OrderDetailLine,

    OrderPaymentLine,

    ProductPackage,

    Quantity,

    ProductComponent,

    GrossSaleAmount,

    PaymentType,

    PaymentAmount

    So if we have an Order with 2 Product Packages, one of which has 1 Component, the other of which has 2, and 2 Payment Methods, we get this in the Data Mart:

    OrderID OrderLine OrderDetailLine OrderPaymentLine ProductPackage Quantity ProductComponent GrossSaleAmount PaymentType PaymentAmount

    1 1 1 1 PACKAGE1 1 CompA1 $16.00 CreditCard $120

    1 1 1 2 PACKAGE1 1 CompA1 $16.00 CreditCard $20

    1 2 1 1 PACKAGE2 3 CompB1 $99.00 CreditCard $120

    1 2 1 2 PACKAGE2 3 CompB1 $99.00 CreditCard $20

    1 2 2 1 PACKAGE2 3 CompB2 $25.00 CreditCard $120

    1 2 2 2 PACKAGE2 3 CompB2 $25.00 CreditCard $20

    So if I summed the Quantity from the Data Mart, I'd get 2x too many for OrderID 1, OrderLine 1, and 4x too many for OrderID 1, OrderLine 2.

    Or if I summed the GrossSaleAmount, I'd get 2x too many for either OrderLine, since the 2 PaymentTypes have doubled the number of rows now that it's denormalized.

    Initially I was tempted to just have the first row in the data mart populated. But then the Quantity or Gross Amount would be 0 in this example if the query (MDX or T-SQL) brought them to OrderID 1, OrderLine 2, OrderDetailLine 2, PaymentLine 2.

    So the rule really is that the Sum of the Quantity is the sum of the DISTINCT OrderID, OrderLine rows, and the Sum of the GrossSaleAmount is the Sum Of the GrossSaleAmount for the distinct OrderID, OrderLine, OrderDetailLine rows.

    But how do I translate that into SSAS or SSIS?

    I know I'm not the first person to deal with this, since the model in the operational source is so common, and the analysis that we're trying to allow also seems pretty run-of-the-mill. I guess it's just a question of getting the right person to read and answer the post!


    Rick Todd

  • I agree wirh Frank - you are trying to represent multiple data grains in one table.

    The Quantity column is at the grain of the line, but your GrossSaleAmount is at the grain of the ProductComponent.  I don't think there is a way to represent both the grain of the "product package" and the grain of the "product component" in the same data set.  Either you need to move the Quantity value down to the grain of the component, or move the GrossSaleAmount up to the ProductPackage level (summing the component GrossSaleAmount).

    Also, the payment information is at the grain of the order header, which can't be stored at either the ProductPackage or the ProductComponent level because of the problem you've described.

    You could probably make the consumers of the data (i.e. SSAS, SSRS, etc) massage the data to make it right, but I always think it's better to correct the issue at the data provider layer.  That way, your "consumers" won't have to be adjusted just because of a data change.

    I'm not sure what your opinion on data warehouse modeling is, but Ralph Kimball has a great explanation of this in this article: http://kimballgroup.com/html/designtipsPDF/DesignTips2001/KimballDT21Declaring.pdf

    Dan

  • I'm definitely a believer in what Kimball has to say, and like to think I'm following his methodology. Although it would certainly appear I've gotten a little carried away with trying to put too much into 1 data mart.

    Now I'm resisting the temptation to start connecting the many-to-many items through bridge tables.

    Thanks for your input Dan.


    Rick Todd

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply