Multiple Lookups while Loading fact table

  • Hi,

    I have a staging table of around 100 mil.i have the dimension loading .Now we are trying to load the data into the fact using SSIS.to get the dimension keys we have around 6 lookups .the package is pretty straight forward.OLE db source , a conditional split ,7 lookups ,and the destination.

    However when i start executing the package i dont see anything happening it stays on like that for long time.Max i waited for it was 15 mins.I dont see the data going thorugh the pipeline.

    Is it something with the cache propety of the lookup.

    I really need help in this.

    Anybody please help.

    How can i load fact

    thanks

  • Hi Rayash,

    Before you begin to load your fact table into data warehouse, I guess you must load the dimension table first. Do you used a surrogate key in your dimension table (in data warehouse, not staging)? What data type you used for surrogate key? Try to use integer rather than other data type

  • In the progress tab when the package is running are you seeing Validation Complete and other messages, this tab should give you more informatio about what it is doing..

    CEWII

  • If you are using full caching check how many rows there are in the lookup table/view.

    If the table you are using for lookups is huge then consider using partial caching instead.

    Hope this helps/makes sense

  • It sounds like a lookup cache issue to me as well. Be aware, the lookup will load all rows from the table or SQL statement priot to the package. Check the progress tab and it will show how many rows it has cached.

  • I dont see any progress any where.Cache is set to use limited memory.

  • to verify i ran the package with one lookup it works.however when i have 6 lookups it doesnt move at all

  • I don't think it's the size of the cache that's your issue but the amount of time it takes to load the cache for the lookup. For example, if I do a lookup transform on a customer dimension with 2 million rows, the lookup will load all those rows. When you start the package and click on the progress bar, you can watch as the rows load.

    For large dimensions, we use a staging table and join into it for the loads as the lookup kills performance on these large tables.

  • Could you elaborate on it.How you join staging to dimension.

    I have the same process .I get the data into the staging and from staging i load the dimension and the usuing the surrogate keys from dimension load the fact.I belive that pretty much what eveybody use.

    In my case we have a relatively bigger dataset from usual times,So doing using proc is not good enough.

    So if you can explain how you handle big dimension and then correspoding fact.

  • Can you tell me the cache setting you have set.I didnt enable the memory restriction so i believe i m not usuing cache and dats why i cannt see the progress..

  • Based on my experiances, when we start a package, SQL Server start populating memory with the data needed by the package. So it's really worth to reduce the size of the data.

    I frequently lookup for customer dimension to get a surrogate key. Sometimes I have to create a snapshot fact table, for instance to see the balance for each customer. In banking case, customer data could possibly a really huge table. That's why I suggested to use a smaller key for surrogate key (for example an integer would be 8KB)

    Another trick is to reduce the size of dimension. You should create a view for active customer, and lookup the key to this view, rather than directly to the table

  • [font="Arial Black"][/font]

  • The delay is caused by the lookups caching your dimension table data. You'll be able to see this by running Profiler, PerfMon, etc...

    To optimize your lookups, make sure you're NOT selecting a "table", but are using a SQL statement to select only the reference columns and key column you're retrieving. Also make sure you're not including dimension rows that are "expired" for the data period you're loading (if you can do that).

    Further management/balancing of memory vs. pre-execute time vs. execute time can be accomplished by using partial caching, and setting the cache size.

    Todd McDermid - SQL Server MVP, MCTS (SQL 08 BI), MCSD.Net
    My Blog - Dimension Merge SCD Component for SSIS - SSIS Community Tasks and Components

  • Hi all,

    sorry before if OOT

    need help, i have sql statement like this :

    select col0, col1, col2

    from table1

    where substring(col0,1,3) IN (select colref from tableRef where active=1)

    if want to convert this statement with SSIS component (bold mark). which component should i use?? does lookup component can handle it ?

    tq

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply