Designing a fact table

  • Hi All: I am beginning to design a DW for our sales department. I have experience with normalized relational db's but no experience with data warehouses, other than one course with the Kimball folks.

    So, I think I have a rudimentary grasp of defining dimension and fact tables. At the moment I have a dimDate, dimProduct and dimDistributor (for dimensions) and two fact tables: I'll call them fctMonthly and fctMonthlyRunning. I would like some opinions as to whether I am going about building the fact tables correctly.

    We are not using analysis services to build cubes, yet. The fact tables are manually loaded (in a matter of speaking), and accessed via t-sql with the result sets presented in reporting services.

    fctMonthly looks like this:

    ID

    PRODID

    DATEID

    DISTID

    CurrentMonth

    CurrentMonthPrevYear

    3MonthsToDate

    3MonthsToDatePrevYear

    6MonthsToDate

    6MonthsToDatePrevYear

    9MonthsToDate

    9MonthsToDatePrevYear

    12MonthsToDate

    12MonthsToDatePrevYear

    fctMonthRunning looks like:

    ID

    PRODID

    DATEID

    DISTID

    CurrentMonth

    CurrentMonth-1

    CurrentMonth-2

    CurrentMonth-3

    CurrentMonth-4

    CurrentMonth-5

    CurrentMonth-6

    CurrentMonth-7

    CurrentMonth-8

    CurrentMonth-9

    CurrentMonth-10

    CurrentMonth-11

    So, for each distributor/month/product combination there is a line item entry in each table, then I go and find the relevant data for the year previous (for example) and do all the updates.....

    To be honest I have no idea if this is the correct way to go about building a fact table, but it seemed to me to be the only way to get the queries and reports that were being requested.

    I would like some pros to weigh in and vote up or down on whatever aspect you feel needs comment. Is this approach sustainable? Was there any need to separate the data into two fact tables? What happens when we move to analysis services (nothing? these tables would not be used by it?), etc. Any and all comments are welcome. Many thanks. D. Lewis

  • fctMonthRunning looks like:

    ID

    PRODID

    DATEID

    DISTID

    CurrentMonth

    CurrentMonth-1

    CurrentMonth-2

    CurrentMonth-3

    CurrentMonth-4

    CurrentMonth-5

    CurrentMonth-6

    CurrentMonth-7

    CurrentMonth-8

    CurrentMonth-9

    CurrentMonth-10

    CurrentMonth-11

    What does CureentMonth1.....CurrentMonth-11 mean ?

    What is the DateID in the fact table doing? What does it represent?

  • Sorry I wasn't clear on that. DATEID is a foreign key to the dimDate table, and represents a month/year combo. So, there is a new row for each month of the year.

    The columns CurrentMonth, CurrentMonth-1, etc. contain the following data:

    CurrentMonth is the total sales in that month for that distributor for a given product. CurrentMonth-1 is the total sales for one month previous, CurrentMonth-2 for two months previous (these are not running totals), etc. So, for the row for distributor "Distributor1" for Product "Product1" for Dateid " January 2008" there are sales totals for 1/08, 12/07, 11/07, 10/07, etc.

    This enables reports that show sales for the current month, the past 3 months, current month of the previous year, past 3 months of the previous year, etc. with some pretty simple and fast queries. The users needed reports that enabled them to select any month-year starting point.

    I came up with this approach because if I had a fact table that had only the current month's sales for a given product/distributor/month-year, trying to return values like the past 3 months, or same period from a previous year, etc. required many subqueries, and were very slow.

    The tables, queries, reports, etc. are not in analysis services, or any kind of cube. They are in regular old sql server. As I understand it building cubes would make this kind of thing much faster, but also needs experience with mdx queries, which I don't yet have. I hope this is all clearer now. Thanks. David

  • Since you already have a DateID in the fact table, you don't put last month's data in the current month fact table.

    So the fact table should look liked

    ID

    PRODID

    DATEID

    DISTID

    TotalSales

    If you want last month's sales, then use DateID to get last month's data.

    You try to build the table to look liked the report you are going to give to the users. That is not the point of data warehouse.

    Once you have the fact table, then you use it to build the report by joining the Date dimension table to get each month's sales, and then calculate 3 month rolling total and 6 month rolling total.

  • Hi Loner: I understand what you say... Unfortunately I found that storing the data as you say, and then relying on queries (in this case subqueries) slowed things down very much. I am not using analysis services or cubes or anything like that -- regular old T-sql. For that reason I built a sort of 'reporting table' that had the data pre-aggregated.

    I am assuming that the fact that nobody other than you responded, and that your response was to point out that I built a reporting table, means that I am not doing the accepted practices thing. So, I'll re-evaluate. Thanks. D

  • Data warehouse is different concept for regular transaction database, unfortunately not many companies or people realized that. In most cases, many companies just put a sql developer to build a data warehouse without realizing it is totally different from building a regular database.

    Most companies use data warehouse tool to build data warehouse and reports for example analysis service, business objects, cognos and many more.

    I did built a data warehouse in SQL server and Oracle in star schema and used it to report. You can still use the data warehouse concept to build the fact table. But it does not mean it will be slow in reporting. I aggregated the fact table myself so when the customer query the fact table, it will run faster.

    What I meant was I built a multi-dimension database using a RDBMS. I did that in 1998.

    In the fact table, I built a measure for each month/year, then I also built a measure for a particular year, quarter.

    Then Date dimension must include month, year, quarter... everything you can think of about the date.

    I hope it helps.

  • Hi David,

    Not sure if you are still doing by this method. SQL Server Analysis services is bundled with sql server so why don't you use that? You can forget about calculating all the aggregations manually and let AS do it for you. It is much faster and powerful.

    The users can use Excel for reporting. It's as simple.

    Thanks,

    Vijay.

    Warm Regards,
    Neel aka Vijay.

  • David, it seems like you are falling into the same trap that a lot of people do by creating a one-off reporting application. By creating the data model for specific set of reports in mind, you create something that will not be flexible or reusable. I would suggest using the suggested fact table earlier and build the analysis services cube. You'll thank yourself in the long run when the business wants to update their reports or add additional measurements.

    By the way, you don't need an ID field on the fact table. The clustered index should be: DateID, DistID, ProdID

    PRODID

    DATEID

    DISTID

    TotalSales

    Also, get a copy of the Data Warehouse Toolkit and read it.

    -Mike

  • Hi all: Thanks for the comments. AS seems to be the way to go -- I agree with the observations re the traps I was falling into. The forums are always a good way to get some feedback! thx.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply