Updating 20M rows takes 5 hours

  • Hi Stefan

    i followed our steps.

    The actual update is only taking 2 minutes. But creating the indexes takes 12 minutes. So in total around 14 minutes. Not a lot improvement if i just do the update with indexes in place. It takes 15 minutes with the indexes.

    This is with 2.6 million rows. if i do it on a 20 million row it will still take long time..Won't it?

    is my assumption correct?

  • ss-457805 (5/19/2010)


    Hi Stefan

    i followed our steps.

    The actual update is only taking 2 minutes. But creating the indexes takes 12 minutes. So in total around 14 minutes. Not a lot improvement if i just do the update with indexes in place. It takes 15 minutes with the indexes.

    This is with 2.6 million rows. if i do it on a 20 million row it will still take long time..Won't it?

    is my assumption correct?

    Not sure here what steps you are talking about. Are you talking about disabling indexes, updating and then enabling indexes again, or are you talking about creating a new table with select into and then creating all indexes again ?

    If you are talking about the update method - you could speed up the index creation by only disabling indexes where the key is actually affected by the update. All other indexes are unaffected by the update and can remain enabled.

  • I followed the below method:

    1) Use SELECT INTO to create a new table with the correct content by joining the two involved tables.

    2) Drop the old table

    3) rename the new table to the old name

    4) Recreate all indexes and constraints

    4th step is taking about 12 minutes.

  • ss-457805 (5/19/2010)


    I followed the below method:

    1) Use SELECT INTO to create a new table with the correct content by joining the two involved tables.

    2) Drop the old table

    3) rename the new table to the old name

    4) Recreate all indexes and constraints

    4th step is taking about 12 minutes.

    In that case it sounds like this would be the best approach:

    1) Disable indexes where the key is directly affected by the update

    2) Perform the update in several batches of about 2 million rows each

    3) rebuild the disabled indexes

  • Disable indexes where the key is directly affected by the update

    Looking at the execution plan The update is hitting the following indexes:

    1. pk_productionaudit

    2. ind_productionaudit_assettime

    3.ind_productionaudit_assetname

    i can disable the nonclustered indexes..What about the clustered index.I can't disable that cos then the update will fail. I'll disable the nonclustered indexes and see how long that takes?

  • Dont worry about the clustered index. It should not be disabled.

  • Stefan_G (5/19/2010)


    Dont worry about the clustered index. It should not be disabled.

    Just a thought... I haven't looked all through this thread for column names and the like but if the clustered index columns are being updated, there could be massive page splits involved in the update. In such a case, disabling the clustered index would keep that huge amount of disk overhead from occuring.

    I still like your other plan of using SELECT INTO (with, perhaps, an on the fly update included) and a rename at the end. The 4th step should take any longer on one table or the other.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff

    In such a case, disabling the clustered index would keep that huge amount of disk overhead from occuring.

    By disabling the clustered index, won't the update statement error. If i disable the clustered index i won't be able to access underlying table data.

  • ss-457805 (5/19/2010)


    Hi Jeff

    In such a case, disabling the clustered index would keep that huge amount of disk overhead from occuring.

    By disabling the clustered index, won't the update statement error. If i disable the clustered index i won't be able to access underlying table data.

    This update does not modify the clustered key. In this case you do not need to worry about disabling the clustered index.

    If you had been updating the clustered key, Jeff is absolutely correct that it might be a good idea to also drop the clustered index. Note that I said drop rather than disable. If you disable a clustered index, the table cannot be accessed at all, as you have already seen.

    But in this case just dont worry about about the clustered index.

  • After testing more i found that the method Select Into.. is quicker than disabling index and running update.

    But i am following Select into for another update like below it takes 2 hours on 2 million rows.

    SELECT p.auditId ,

    p.vFrom ,

    p.vTo ,

    p.assetId ,

    p.availability ,

    p.status ,

    p.opMode ,

    p.qtyIn ,

    p.qtyOut ,

    p.qtyProcessed ,

    p.qtyRejected ,

    p.countUnitId ,

    p.rate ,

    p.shiftId ,

    p.runId ,

    p.productId ,

    p.crewId ,

    p.crewSize ,

    p.stopEventRefId ,

    p.rejectEventRefId ,

    p.xnCode ,

    p.version ,

    p.shiftAuditId ,

    p.cellAssetId ,

    --p.assetname ,

    --p.assetdesc ,

    --p.assetRunCostPerHour ,

    --p.assettype ,

    p.assetname ,

    p.assetdesc ,

    p.assetRunCostPerHour,

    p.assettype ,

    p.countUnitDesc ,

    p.shiftName ,

    p.shiftDesc ,

    p.runname ,

    p.productName ,

    p.productDesc ,

    p.productCountUnitId ,

    p.productCountUnitDesc ,

    p.materialCost ,

    p.crewName ,

    p.crewCostPerHourPerHead ,

    p.cellAssetName ,

    p.cellAssetDesc ,

    sh.auditid as lastStatusChangeAuditId

    INTO dbo.New2

    FROM dbo.productionAudit p

    Left JOIN statushistory sh on p.assetid = sh.assetid

    AND p.vFrom >= sh.vFrom

    AND p.vTo <= sh.vTo

    Why is this? Please help.

  • If you post the execution plan, table definitions, and index definitions it becomes much easier to help... 😉

  • hi Stefan

    Please find attached the table and index definations and executionplan.

  • I thought I'd "bump" this one for the OP since he provided everything requested but hasn't gotten a reply yet. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • i am struggling with this one.. I can't understand y the estimated number of rows is more than 7 millions.. When the total number of rows is 7 million for productionaudit..

    Any ideas?

  • ss-457805 (5/23/2010)


    i am struggling with this one.. I can't understand y the estimated number of rows is more than 7 millions.. When the total number of rows is 7 million for productionaudit..

    Any ideas?

    Yes... like I said... "accidental cross join" in the form of a many-to-many join. Any chance of you saving the execution plan as a "real" execution plan so I can load it up is SSMS to have a peek?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply