Cube Row Count

  • Does anyone know if there is a formul for calculating how many rows a cube is going to have.  Of course I know how many rows will be in my fact table and my dimensions, but I know they can't be compared on a 1 for 1 basis.  Just wanted to find out if there was a formula that can be used.

     

    Also, I understand the differences between Holap, Rolap, and Molap, but I'm not 100% positive about how to figure out which one will best fit me.  Is there a website that anyone knows of that defines how to make a logical decision in this matter?

     

    Thanks in advance for you help.

     

    st

  • st,

    Always use MOLAP. HOLAP and ROLAP are available for political reasons rather than technical - Microsoft just wanted to put a "tick in the box" for ROLAP and HOLAP to say "we do them". No proper R+D has gone into this since v7.

    There's no discernible reasons to use ROLAP/HOLAP except if you are implementing real-time OLAP in which case ROLAP would be used - even then there are reasons not to do this.

     

    Rows in a cube: If I understand your question correctly this will be equivalent to the number of records in your fact table (unless you have a filter on there or something). Why do you think they can't be compared on a 1 to 1 basis and what do you mean by this?

    Hope this was useful to you.

    Regards

    Jamie Thomson

    Conchango PLC

  • Hi,

    A cube does not have "rows" as such. But the number of rows needed to be read from the database to populate the cube is usually equal to the number od rows in the fact table. ( Can differ if You have filters, unjoined rows or M:M link tables ).

    A Cube does have Cells though, and the number of cells affect the size of the cube. The number of cells in the cube is the cartesian product between the number of members of the dimensions.

    Rough example: Cube with 3 dimensions: Time (365 members), Gender (2 members), Product (20 members). The number of cells ( at the lowest level ) = 365*2*20 = 14600. You will also have a number of aggregated cells depending on the levels in your dimensions.

    A lot of these cells can be empty ( no data ), and SQL Server is quite good at optimizing storage so many of these empty cells might not be in the cube.

    /rockmoose


    You must unlearn what You have learnt

  • Jamie, your answer is exactly what I was looking for.  So is it my understandin also that the same amount of Hard drive space is used weather you use Rolap, which stores all aggregations on the database side or use Molap, which will store the cube on the client side (I hope this is correct).

    In reference to your question regarding the fact table.  Lets say I have 5 key rows and 1 measure.  Wouldn't this set up create fewer rows than say a fact table that had 5 key rows and 7 measures?

    My issue is this.  My number of key rows are pretty well set, but for measure purposes, I am creating various demographic counts in the fact table (i.e. count of members, count of members age 1-9, count of members age 10 - 15, etc.)  Along with those items I am also breaking it down by Gender, geographic area etc.  I thought the best place to do it is in my extract code when I am populating the fact table.  Do you agree with this step.  Also this points to my question.  Do row counts go up exponentially with the more measures I create AND the more dimensions I create?

    Any help you can give would be awesome.

    Thanks in advance.

  • st,

    MOLAP will use more disk space than ROLAP but the storage gain with ROLAP is negligible compared to the performance gain with MOLAP - hence MOLAP is the way to go.

    I'm having trouble understanding what you mean by row count in your cube - see rockmoose's post in this thread earlier for some helpful info.

    Hopefully the following will answer your question:

    A fact table with 7 measures will take up more space than a fact table with 1 measure that is sure - that is true both in your data warehouse and your cube (unless you were using ROLAP which as we have discussed I wouldn't recommend).

    Do you mean that you have a seperate measure in your fact table for each of the following:

    members aged 1-9

    members aged 10-15

    etc...

    If so this is not the way to go (unless there is something about your requirement that I don't understand). I would have another dimension called "Age Group" which would result in only 1 extra key field in your fact table but alot less measures.

     

    In answer to your question "Do row counts go up exponentially with the more measures I create AND the more dimensions I create?" it is impossible to answer yes or no - it depends on many factors, principally the make-up of your data. If you create more measures the number of rows in your fact table will NOT increase but the table width will. If you create more dimensions then the number of rows WILL increase, exponentially or not.

     

    I think you are trying to limit the number of rows in your fact table. This is a data warehousing issue, not Analysis Services and therefore is not a question of MOLAP/ROLAP/HOLAP which is what I think this thread started out as.

    Feel free to send me a message to my inbox if you want to take this offline.

    Regards

    Jamie Thomson

    Conchango PLC

     

    P.S. A cube is not stored on the "client side" as you stated at the top of your second post. It is stored within the Analysis Services server component. Your choice of ROLAP or MOLAP has no effect on the amount of space used to store your fact table.

  • Hi Jamie, you've got a lot of great info and I really appreciate you helping me out.  You've answered my questions and I'm sure of which route I want to go now, however I'm not sure I would complete this AGE dimension.

    The answer to your question is yes, I was going to create a measure for each demographic, heres is the list.

    total_mbr_cnt - just a count of all mbrs

    f_mbr_preteen_cnt - count of females less than teen

    m_mbr_preteen_cnt - count of males less than teen

    tot_preteen_cnt - count of all preteens

    f_mbr_teen_cnt - count of female teens

    m_mbr_teen_cnt - count of male teens

    tot_teen_cnt - count of all teens

    f_mbr_adult_cnt - count of females over 19

    m_mbr_adult_cnt - count of males over 19

    tot_adult_cnt - count of all adults

    So these are my measures, I so would I create a dimension with a key indicating an age i.e. (i.e. key 1 = 0 - 12).  If so, wouldn't I still need to have a column in the fact table for each one of the measures listed above.

    I hope I'm not confusing you.

    Thanks Jamie.

  • See the reply that I have sent to you via email. It contains a script that will create a cut-down version of what I think your star schema will look like.

    Basically yes, you DO need to create a dimension with a key indicating age but NO, you wouldn't need a column in your fact table for each of those measures listed above.

    Regards, Jamie

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply