SSIS and Transaction Log Growth

  • TomThomson (3/11/2014)


    Koen Verbeeck (3/5/2014)


    Interesting question, but I do not agree entirely with the answer.

    Setting the commit size to a lower value will commit batches before the entire load is done. What if something fails and I want to roll everything back?

    Splitting the load up in smaller chunks but in manageable transactions seems to make more sense.

    That was my reaction too on discovering that splitting the stream into small chunks was not the "corect" answer. when I split the thing, I know where it got to before any failure; when I use a mgic number under teh covers I have some working out to do to find out how to recover. I believe the latter is more complicated than teh former, so that the "corect" answe is not by any means the simplest way to achive the objective except in that extremely rare case (for most of us it mever happens) where coping with failure is not required for some reason.

    Okay, after Tom's explanation here, I realize the problem. I didn't word the incorrect answer as well as I could have.

    What I had in my head was: Use conditional splits to split the stream into multiple streams to deliver via multiple OleDB targets as separate transactions.

    There's the confusion, Koen, that you mentioned a while back. 🙂

    Edit:

    In my experience, however, when a load fails you reload it from scratch, even something this large. The reason being you can't be sure what's missing and what's not. Let's take the multi-stream example of what's missing. How do you know which other chunks succeeded and which didn't? Do you rely on the order in the origination file? If it's a query, you can't rely on order at all, how do you control for what chunks you will get? Do you turn off certain streams, redeploy the package, and rerun for the missing pieces after hours of painstaking double checks?

    This is not something in most places that would have been done. If the error repeated a few times, a new design would be put into play, sure. But for a one off to a staging table? No, you'd rerun the process after fixing the bad data row (or making sure the network wasn't going to burp again, the usual culprit for my life) and keep on trucking.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (3/11/2014)In my experience, however, when a load fails you reload it from scratch, even something this large. The reason being you can't be sure what's missing and what's not.

    May be a slight bit off topic, but I like to design for retry and restartability within a DATA FLOW TASK as opposed to a control flow task. The SSIS Operational and Tuning Guide has a nice example, but the caveat to doing this is the ability to have a solid data extract query (and range metadata if you decide to split this into multiple streams). In some cases it is not easy to do.

    Read pages 16 onward......

    http://msdn.microsoft.com/en-us/library/jj873729.aspx

  • Nice question. 🙂

Viewing 3 posts - 31 through 32 (of 32 total)

You must be logged in to reply to this topic. Login to reply