Select Distinct information from millions of rows.

  • I am building a DW, part of it involves normalizing some transactional type data. Keep in mind the two input files contain 130ish, and 90+ million rows. 

    From the 130 million rows, I need to build a common table selecting the distinct set of about 20 different fields. Then link these back to the remaining data from each row.  The same process will be needed for the second input file, with the same 20ish fields.

    First I have to load this data, then I will need to process updates on a monthly basis.

    Any good ideas, I have the data in flat files by year ??


    KlK

  • I assume you have read up on OLAP, building cubes, dimensions, members, snowflake schemas etc etc..?

    (no, I'm not the guy to ask about those things )

    I might have some ideas, though, but I'm not quite getting the actual question here...

    /Kenneth

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply