Healthcare data warehouse design resources

  • I'm kind of an accidental DBA working for a community health center and it's become apparent to me that we need to build a data warehouse. I've picked up a few books to help me out, including Kimball's famous Data Warehouse Toolkit, read a lot of blogs and articles and tutorials, etc, and taken a class on Analysis Services. I think I have a pretty good foundation but there are some complex requirements specific to healthcare and our type of center in particular that I need some help conceptualizing. We are a non-profit so paying a high priced consultant is pretty much out of the question. I hesitate to sign up for any more training classes because the last thing I need is another example or boot camp using AdventureWorks or any other retail model because it just doesn't relate very well to my situation. Are there any other resources/websites out there dedicated to modeling healthcare data?

  • Find a business sponsor (C-level or high-level manager, if possible) who you can sell your idea to. Explain the advantages and benefits to be gained from your DW. Find a small problem that interests your business sponsor -- don't try to tackle anything too ambitious at first. Take an incremental approach. Once you let your business users link MS Excel to your cube and see how easy it is to do ad-hoc querying and reporting they'll easily think of more ways to expand your cube.

    Since you're in healthcare, try something like: Create a date, diagnosis (ICD9) and procedure code dimension. Create a medical claim fact table. Not terribly complex, but a start that should illustrate the power to your users. Slowly add on to your base. Experiment as you go.

    Just some ideas,

    Rob

  • AJ,

    What particular concepts are you feeling you're having difficulty with? An end to end discussion on warehousing, even particularly for healthcare, is incredibly broad. A few particular components you're having the most difficulty with will help.

    One thing you'll really want to make sure to keep in mind when warehousing for healthcare. DESTROY the identifying information. SSN, names, addresses, perscription card numbers, cc#s... anything that can locate a particular user must be anonymous. GUIDs make excellent replacements here. You can review all relevant data to a particular person without being able to directly identify said particular person. This is a HIPPA requirement.

    Beware that may not be enough for publically exposed information. Enough history and you can piece anyone back together eventually. It will depend on what-all you've got stored.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Oh, I think the idea has been sold. After my SSAS class I built a small demo with an encounters fact table and Location and Provider dimensions. It was a huge hit so of course it has kind of exploded from there.

    So here is something I'm struggling with:

    Special populations or registries: e.g. for diabetics I want to track things like their most recent HgbA1C and then slice them by their primary care provider (PCP) and the ranges of the A1c value (none, < 7, 7-9, or > 9), and also maybe show an average A1c for each PCP. And then roll everything up for the PCP's primary location, and then of course for the whole organization. Other special populations have different measures to be analyzed, like for hypertensive patients we'd be looking at their last blood pressure instead, or HIV patients their last CD4 count.

    So would I put the A1C, BP, CD4 all into one fact table, with the key to the Patient Dimension? And create a measure type Dimension? Then is it okay for me to link the Patient Dimension back to the Provider Dimension in my DSV? Or is it better to keep track of who the patient's PCP was at the time the fact occurred (have a Provider Key on the Measures fact table)? Keeping in mind the patient's PCP could potentially change from month to month, BUT if we're looking at a whole year we just want to see who their PCP was at the end of the year 🙂

    And what's the best way to keep track of who is in each registry (assuming each patient could potentially be in more than one)?

    I could go on and on... but I think some guidance on the above might answer some of my other questions as well.

  • There is a much better way to do a health data

    warehouse. You no longer need to spend a lot of

    time and money trying to program this kind of thing

    from scratch. You can buy this service for a lower

    cost, and much better results.

    http://www.ehdp.com/vitalnet/ explains about the

    Vitalnet health data warehouse.

    If the organization is very small, you might want to

    make something home-grown. Otherwise, it's totally

    not cost-effective.

    Thanks,

    Daniel Goldman

  • From what you have described, it seems one item you are struggling with the level of granularity. Make sure you define upfront, the lowest level of detail you want to see. Second, it sound like with some of the values you want to describe the conditions using some discretization buckets. Since each of these conditions is not mutually exclusive, it would seem they would have their own attribute (and maybe dimension).

    Of course you mentioned doing averages.... you should be able to do those at the reporting layer (Excel pivot commonly) unless there is some additional complexity, then you may have to do a calculated measure.

    As to the PCP and Location, they should be additional dimensions surrounding your fact table. I would assume your fact table would have a location_ID and PCP_ID of some sort, that would be joined to a Location dimension table and a Physician dimension table. I am not sure how organization fits, but that could be another attribute of the location dimension (if I understood correctly) and then a hierarchy could be created with organization > location.

    Furthermore, you talk about the PCP changing; this situation forces you to decide which type of Slowly Changing Dimension (SCD) you want to use. That decision is really a business decision that you must make... ie tracking the history or just changing them outright (it is more complex than that), but there are plenty of tools and websites describing that process.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply