Historical data in datawarehouse

  • Hi,

    I designed a datawarehouse that contains only current patient data (i.e. if some data is modified for a patient, the "new" data overwrites the "old" data) in the datawarehouse. This works fine, but the customer wants to have any historical data in addition to the current data in the warehouse.

    Do I need to build a separate warehouse that will contain historical data? What are the design considerations in such a case?

    I would be glad if someone can you point me some articles or books that can help me.

    Thx.

    Yanick

  •  

    Hi Yanick,

    You can achieve this task by taking the snapshot of the data. You can create a new table called history like your fact table and dump all the history records in it with a proper version name like(1,2,3,4, soon) and date(20th,21,22, soon) and append this records to your existing fact table. And make sure that when ever your populating your fact table with fresh data you should generate a new version so that it should not effect the existing version records. It is bit complex and length task to do but at this stage I can suggestion you only this. if you questions do most them and will try to reply them.

    Thanks

    Pawan

  • You need something new or you need to modify what you have already. Many people's definition of a data warehouse differs. I like to think of a data warehouse as giving a single version of the truth about your business for any point in time. Using that definition the system that you have already built is NOT a data warehouse, even though you refered to it as one.

    Before getting into design phase for a DW you need to consider:

    1. What transactional events do you want to capture

    2. The granularity (i.e. the level of detail) at which the data is required. For example, do you want to know about every single transaction or are you happy summarising that up to count and sum of all transactions per hour/day/week/month.

    3. What information do you want to store about the transactions (e.g. When/Who/What). This information will drive your dimensional model.

    Your customer will provide the answers to these questions. To me that is your starting point! Good luck.

     

  • What you have, from the description, is what is called a slowly changing dimension.  In other words, the information can change, but not all that often.

    You have to make a decision on the design.  For example, if the patient is a woman and she gets married.  Do you want her old name as well as her new name?  In the reporting of information for a specific time period, do you want to use the name she was using at the time or her current one?

    If you are going to use both, then you will have to put effective dates in the dimension table that holds the patient information and use them to do the join for information in your fact tables.

    If you are going to use the current but want to have the other available only as historial data, you would create a "snowflake" table to store the historical data with the date it was changed.  Your dimension table would only contain the current information

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply