SQL to DELETE and INSERT using Fast Load

  • I'm loading data from various sources, and want to delete the old data from a source before loading the new. At the moment, I'm using OLE DB Destination using Table - fast load, but of course that adds the new data without removing the old. I want to write SQL to fast load, with a DELETE ... first, but I can't find an example, (even with Mr. Google's help 🙂 ). What is the SQL to do an INSERT from a data stream using fast load please? [I'm using SQL Server 2008 with SSIS on Windows XP]

  • I follow these steps in scenario that you explained:

    1. TRUNCATE the table before the data flow task

    2. Load the table with Table - fast load option.

    Bu it depends on:

    1. How many old records are there in the table that you want to delete?

    2. How much data you want to Insert?

    3. How many indexes are there on the table? (in some case I drop the indexes before load and then recreate the Indexes after the load)

    -Vikas Bindra

  • It'll be DELETE ... WHERE rather than DROP or TRUNCATE, as I don't want to lose other data in the table, from other sources. At the moment there are 6 tables from 8 files from one source, with data being combined/cleaned/normalised on the way in. Eventually I'll be combining data from several sources, each originally in different formats of course 😀

    As to size - a few hundred thousand records per source, and (so far) only a primary key per table.

    Can I DELETE and then fast load in one OLE DB box, or do I have to use separate data flow boxes for each step? The diagram is getting complex and I'd like to avoid too many more boxes!

  • You can delete inside the DFT using the Data flow transformation task "OLE DB Command". But I would suggest you to delete before entering into the Data Flow Task using a DELETE...WHERE statement in a Execute SQL task in the Control flow.

    -Vikas Bindra

  • Yes I see, that would be the best way to do it. I was focusing on the data flow and didn't think about deleting the about-to-be replaced data BEFORE in the Control flow starting the whole dataflow. Thanks!

  • Just to warn people about a "Gotcha" when using Fast Load.

    I'd split Name and Address off into separate tables, but one record failed validation at an earlier stage. Once I'd corrected that, I expected it to load - but it didn't. And I could not see why for ages.

    My address table has a primary key, so it won't let you add a given record a second time; I was using that fact when loading new data, to add only new addresses **m ignoring errors from duplicates. But that missing record didn't load - even though its key was not present in the Address table.

    What I think was happening was that Fast Load was sending mostly-existing addresses to the table in block of many records, and if ANY failed to load due to a Primary Key clash, the whole batch failed to load. When I changed it to normal (one record at a time) load, the missing record loaded OK.

    So don't trust Fast Load unless you're sure there have been NO load errors. And don't tell it to ignore errors!

    ** Yes I know this is a "quick and dirty" technique - and now I have other things working I've gone back and used a cached Lookup to only load new address records that are not there already, (using the No Match output). Now I can set it to Fail on error, as there won't be any unless something has really gone wrong.

  • You may find this article interesting.

    It describes how you can have best of both worlds: fast loading but selective redirection of records that failed to insert. It requires some extra work in configuring your packages though.

    Regards,

    Willem
    http://wschampheleer.wordpress.com[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply