Data warehouse design issues

  • I am in the process of implementing a new data mart design. I will be partitioning the db (SQL Server 2000) fact table(s) by month. My monthly fact tables will each have approximately 63 foreign key fields and each monthly fact table will hold between ten and twenty million rows. I have two questions:

    1) Would it be better to have one cube (with one partition) that references a view which covers via 'union all' the underlying monthly fact tables or would should I partition my cube each month and tie the new partition every month to the new month's fact table? I am thinking in terms of query performance and cube processing times.

    2) With so many foreign key fields in each fact table, should I also look into partitioning the fact tables vertically as well? Say by creating a partition that holds the record id field, appropriate date columns and constraints, and the foreign keys that would not be referenced too often in queries against the underlying cubes? I would like to avoid partitioning the fact tables both horizontally and vertically due to maintenance needs of both the tables themselves, any views created over the tables, and any cubes which reference the tables.

    Your thoughts and insights would be most welcome. Thank you!

    Michael

    Michael Weiss


    Michael Weiss

  • This was removed by the editor as SPAM

  • Michael -

    I don't think you'd gain much from a vertical partitioning scheme but your first question is an interesting one.

    Partitioning the cube into monthly data stores gives you better processing and query performance.

    Processing performance benefits:

    * potential parallel processing (requires the SQL 2000 resource kit)

    * ability to process only newest data (assumes prior months are historical)

    * can easily (and automatically) delete oldest partition if you want to maintain a set collection of historical data (IE: 24 months + current month)

    * each partition can have its own aggregation design. So, infrequently queried partitions can have a very low (or no) aggregation design (or ROLAP storage) which reduces greatly minimizes processing

    Query Performance benefits:

    * each partition can have its own aggregation design so most frequenly queried partitions can have a relatively higher percentage of aggregation

    * defining a slice in the partition design allows Analysis Services to determine which partitions are in play when evaluating user queries

    Scot J Reagin

    sreagin@aspirity.com

    Scot J Reagin
    sreagin@hitachiconsulting.com

  • I guess what you're saying is that you shouldn't use the 'partitioned view' for your olap cube? This would mean you do your partitioning in analysis services and not in the relation database. I've been reading datawarehousing recommendations which say to set up tables with constraints defined (partitions) and then use views with UNION ALL to access the data. I've been wondering if this applies to the analysis services world and it would appear not.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply