Alternatives to foreach loop task

  • Hi All,

    I read in several discussions that the foreach task is not intended to run over large datasets.

    Now I'm trying to update customer data over more than 1 million customers. My design is as follows:

    - I'm using an Execute SQL Task to read the customer_Id

    - I use the resultset as the input to the ForEach Task

    - In the Foreach Task I'm using a couple of Data Flow task to update data (one to update SUM columns and one to update COUNT DISTINCT columns)

    This works but is extreamly slow. I have to perform millions of updates (around 30MM).

    Any aleternative design?

    Your commnents will be appreciated

    Kind Regards,

    Paul

    Paul Hernández
  • Ok,

    This question is very stupid, sorry for this.

    The solution is very basic, I don´t need to select all the customers and then iterate one by one to perform the data flow tasks. I can select all of the customers inside the data flow and perform all the operations that I need to the whole dataset, that´s one of the greatest features of SSIS, isn´t it?

    Kind Regards,

    Paul

    Paul Hernández
  • Hi Paul,

    Can you explain that a little more or provide a link to your solution idea? I am attempting something similar in that I am running a Foreach Loop on a table of about 400 rows and it is taking a long time. I am liking SSIS but am unsure of all its capabilities.

    Regards:
    Mordred
    Keep on Coding in the Free World

  • Ok, I will try 🙂

    First I load a table "A" with transactions from a particular week. Then I want to update a table "B" which cotains aggregated data from "A". I don't want to recreate the table, only updates records for the customers in the new transactions. For instance, I keep track of the purchases amount in the table B, so I would like to add the new amount for each customer. I also want to update the number of different departments (from a particular store) where every customer has purchased.

    Table A contains one record for every item a customer purchase. Example

    Table A

    CustomerId - item --- quantity --- department -- amount --- date

    ---- 002 ------ Book ------- 2 ----------- Library -------- 20$ --------- 01.08.2012

    ---- 002 ------- CD -------- 1 ------------ Music ---------10$ --------- 01.08.2012

    ---- 002 ------- CD -------- 1 ------------ Music ---------12$ --------- 04.08.2012

    .... ..... ... ...... .... ...............

    Table B

    CustumerId Count(distinct department) SUM (amount) Max(date)

    ---- 002 ------------------ 2 ---------------------- 42$ --------- 04.08.2012

    My first approach was to query the distinct Customers from table A. Then use the resulset as the input for a Foreach container. Then for every row (Customer_Id) perform two data flows:

    1. DTS: Select the data from the Table A and current customer. Aggregate the data, lookup the customer in Table B. If the customer is in Table B update the SUM (like total amount) fields, if not create a new entry in the Table B for the customer.

    2. DTS: Select the data from the Table A and current customer performing the Count Distinct and Max operations. Then update the Table B.

    That works good but the performance is terrible. Then I realized that I can query the whole data in Table A (for a particular date range, I mean, the new weekly records), make a lookup with the Table B (In fact is a Left Joint). From this leftjoin you should obtain something like this:

    Aggregations from Table A Left Join Table B on Customer_Id

    Aggr.CustumerId Aggr.SUM (amount) B.CustomerId B.Total_Amount

    ------- 002 ---------------- 42$ ------------------ 002 ------------- 3250$ --------

    ------- 125 ---------------- 58$ ------------------ NULL ----------- NULL ----------

    .............. ..................... ........... .......................

    Then you are almost done! Just use a conditional split evaluating the B.CustomerId:

    - If null is a new customer, then insert the data from Aggr. in Table B

    - If exist, add Aggr.SUM(amount) + B.Total_Amount and update this row with the new values.

    Hope you can understand my explanation 😉

    Kind Regards

    Paul Hernández
  • Thanks Paul, I'm going to look into querying the whole table and see what I can do. Thanks for your explanation.

    Regards:
    Mordred
    Keep on Coding in the Free World

  • Based on your description update-join seems close to what you want to accomplish. Lookup update-join. Ben Nadel has few good articles on it.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply