Dimensions vs. Member Properties

  • Hello...

    I'm having a tough time understanding how/when to use Member Properties versus a Dimension. For Example:

    A basic hierarchy in which a CLIENT owns one or more ACCOUNTS. I've created a CLIENT Dimension which has 2 levels - The CLIENT_NAME and the ACCOUNT_Name. From here my ACCOUNTS have properties - like - a CREATED_DATE.

    Main question: If I make the CREATED_DATE a MEMBER PROPERTY (which really it is) then I cannot view data in aggregate based on CREATED_DATE. With me ?????

    Another Property/Dimension example would be the Accounts "Type" property. TO me it's a MEMBER PROPERTY of an Account - but if I do that then I cannot view Dollars per ACCOUNT_TYPE.

    It appears to me that I need to make CREATED_DATE a DIMENSION on its own in order to view aggregates or filter / group by CREATED_DATE.

    Do I need to do things in duplicate?

    Include all ACCOUNT Attributes as MEMBER PROPERTIES in one main dimension, and then include each useful ACCOUNT attribute as a DIMENSION on its own ??

    The vocabulary of this OLAP stuff is quite difficult to grasp.

    Any suggestions other than sticking a pencil deep into my neck ??

    - B

  • Yes...make sure the pencil is sharp! Seriously, this is a subject that I just finished spending the entire weekend studying...and I am not sure if I am any closer to understanding it fully. In my application, I have several columns in dimension tables that do not, in my opinion, qualify as levels in a dimension. I am looking at making member properties out of these columns, then making virtual dimensions out of the member properties. However, AS won't store aggregates for virtual dimensions so I am back to wondering which way to go. Check into making virtual dimensions out of member properties and perhaps that will help you to decide what to do. In the meantime, perhaps someone with more experience in designing OLAP databases and data cubes will weigh in and help to clarify when to make a data column a dimension (or level in a dimension) and when to make it a member property.

    Michael

    Michael Weiss


    Michael Weiss

  • Thanks M-

    The "Virtual" Dimension seems to be a step in the right direction. However, why create a "Virtual Dimension" rather than a "Real" dimension containing the same Member properties as the pre-existing Dimension ???

    Is there a performance increase or data mass decrease ?

    On the surface - theses cubes have a quick WOW factor but there's alot under the hood and perhaps many design and performace paths to be taken.

    - B

  • Because aggregations are not stored for virtual dimensions, there is a performance increase from the perspective of time reduction in processing the cube. Similarly, there is a reduction in storage space required. However, it appears to me that if a user wants to analyze data by one or more virtual dimensions, the query speed might be rather dismal depending on how much data has to be pulled, counted, summed, etc etc from the relational db...

    Michael Weiss


    Michael Weiss

  • So perhaps "VD's"(dare I use that acronym O: ) exist ultimately as a space saver with the trade-off on aggregate result speed???

    Anyway - my first attempt was the best so far. I kept my Dollar measures AND ALL OTHER Account properties in the ACCOUNT table and made each property its own dimension. That, in it's basic form behaves as advertised.

    Now - my biggest question is the design of the Dimensions vs. Facts and how/how often to get data in this magical "Staging" area. Mainly "Serial-IDs" and linking them back to the Operational DB or creating new keys etc..

    Seems like a potential disasater waiting to happen 🙁 gulp...

  • The literature I have studied all seems to suggest that surrogate keys are the way to go(an identity integer column in SQL Server, for example). This will negate any possible conflicts that could occur in the future if you used primary key identifiers from the base data itself. For instance, I know of an insurance company where they re-use claim numbers...the claim numbers repeat each year. Obviously, your ETL(Extract, Transform, and Load) processes will have to check for duplicate records and eliminate them in the Extraction and Load processes. In my opinion, you are correct in your assumption that you could run into problems down the road using surrogate keys IF your ETL processes are not well thought out. How often to load data into the data warehouse and process the cubes would be a function of business requirements I would think. A well designed data warehouse and OLAP data cubes should allow you to extract, transform, and load the data and process cubes as frequently as every few minuutes if need be. If you are in the design stage of your data mart or warehouse, you should study the details of what happens when a cube is processed and how you can incorporate design features that will enable you to process your cubes as much as ten times faster than what would be possible if certain design factors are not incorporated into the backend database(these mostly have to do with a design that allows the AS database to rely on primary keys/foreign keys and elimnate unnecessary table joins during processing). Just like SQL Server itself, or any complex RDBMS, there are a plethora of details that can trip you up and send you back to the drawing board.

    Michael

    Michael Weiss


    Michael Weiss

  • Thanks again Mike.

    Ya - I've been reading myself and "Surrogate Keys" are mentioned many times. However - What if I wish to link from my Cube back to the source Transaction Database for very-very granular information ? Once I go into the staging area - should I break ALL links with the Transaction DB ? (Meaning do not store the Transaction DB Identity columns in the Fact/Dimension tables). This will require much tinkering I feel.

    Also - how do you feel about hte "Incremental Update" ability - does it work as advertised ?

    - B

  • If you want to link your data in the warehouse back to the source, you could certainly include the original source identifying key...maybe as a member property? Also, just because a column is in a table in your data warehouse doesn't mean it has to be included in a dimension. So including the original source key wouldn't be a problem. More problematic might be not loading your data into the data warehouse at the most granular level possible. AS is very good at aggregating, so unless business requirements dictate that the data be loaded at something other than the most granular level, I would certainly load at the most detailed level. This way your users can drill through to the most granular level and AS can handle all the aggregations. Also, you won't run into any problems if you get into using calculated measures such as averages, % of totals, etc...

    As for incremental updates, in my experience it works very well...I have had no problems to date. You do need to insure that you do not 'double count' data though...but if you use surrogate keys, this is easy enough to prevent.

    Michael

    Michael Weiss


    Michael Weiss

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply