SSIS Data Flow vs SQL Stored Procedure Loop

  • Hi
    i have a stage table with lots of records and need to move to the final table.  When moving need to perform insert/update based on primary key value.  Have to move record one by one for tracking error in case of any issues.  i am trying to determine which method is faster in terms of performance , WHILE loop in SQL or SSIS Data flow task? Can someone provide the input?

  • When moving need to perform insert/update based on primary key value.


    This sounds like a job for MERGE to me; there's no need for a loop or SSIS for that. 

     have to move record one by one for tracking error in case of any issues.


    In other words, you need to analyze your table row by agonizing row. Why? Why not analyze them all at once? That's something SQL Server is very good at doing. 

    i am trying to determine which method is faster in terms of performance , WHILE loop in SQL or SSIS Data flow task? Can someone provide the input? 


    Again, you don't need a loop or SSIS for this. The ideal solution will likely be a MERGE statement that performs the INSERTS & UPDATES based on the key column. I would make sure that there's a clustered index on that column.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • SSIS is better handling errors as you can send the problematic rows to a different destination.
    T-SQL can be faster and easier to write. But, as mentioned by Alan, it should be done in sets instead of going row by row.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I wont be able to perform as set because i have to keep track errors for each row (expected errors likely be Foreign Key errors).  Will SSIS perform better through Data Flow task or Stored Procedure loop method?

  • ShuaibV - Tuesday, May 9, 2017 1:08 PM

    I wont be able to perform as set because i have to keep track errors for each row (expected errors likely be Foreign Key errors).  Will SSIS perform better through Data Flow task or Stored Procedure loop method?

    Then create another table to hold the "errors".
    process the input data once to flag all records that have errors and output those onto the errors table

    After validating all data process the valid ones onto their final destination, and those on the errors table you ignore and report
    No need for any loop and you still report each row with errors.

    One way I did this previously was to have a table with the same layout as the input one, plus 1 column for each possible error

  • The source and destination tables are same structure.  Source does not have refrential integrity key defined.  If i understand correctly, you are suggesting to use Data Flow task to process the data and re-direct the error rows to Errors table, correct?

  • Lets assume the following example
    Standard process would be
    - dataflow -
      Input file to Final table

    Approach I am suggesting is
    - dataflow -
      Input file to Staging table
    - execute sql -
      validates data, inserts bad records onto error table - set based operation
      insert into errortable
      select ...
        , case when fk_table1.key is null then 'Y' else null end as fkey1error
        , case when fk_table2.key is null then 'Y' else null end as fkey2error
      from stagingtable
      left outer join fk_table1
       on ...
      left outer join fk_table2
       on ...
      where fk_table1.key is null
       or fk_table2.key is null
       ...
    - execute Sql or dataflow - if Sql then it may well be on the previous step
      staging table to final table - excluding bad records
      select
      from stagingtable
      left outer join errortable
       on ...
       where errortable.entry is null

    if the input data does not have a unique key value you can define the staging table with an extra identity field so it can be used to join to the errortable

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply