Updating 20M rows takes 5 hours

Question

Post reply

Updating 20M rows takes 5 hours

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply

ss-457805 SSCertifiable Points: 5872 More actions · Answer 1

Hi Stefan

i followed our steps.

The actual update is only taking 2 minutes. But creating the indexes takes 12 minutes. So in total around 14 minutes. Not a lot improvement if i just do the update with indexes in place. It takes 15 minutes with the indexes.

This is with 2.6 million rows. if i do it on a 20 million row it will still take long time..Won't it?

is my assumption correct?

blog: http://sarveshsingh.com Twitter: @sarveshsing

Stefan_G SSCertifiable Points: 6609 More actions · Answer 2

ss-457805 (5/19/2010)

Hi Stefan

i followed our steps.

The actual update is only taking 2 minutes. But creating the indexes takes 12 minutes. So in total around 14 minutes. Not a lot improvement if i just do the update with indexes in place. It takes 15 minutes with the indexes.

This is with 2.6 million rows. if i do it on a 20 million row it will still take long time..Won't it?

is my assumption correct?

Not sure here what steps you are talking about. Are you talking about disabling indexes, updating and then enabling indexes again, or are you talking about creating a new table with select into and then creating all indexes again ?

If you are talking about the update method - you could speed up the index creation by only disabling indexes where the key is actually affected by the update. All other indexes are unaffected by the update and can remain enabled.

ss-457805 SSCertifiable Points: 5872 More actions · Answer 3

I followed the below method:

1) Use SELECT INTO to create a new table with the correct content by joining the two involved tables.

2) Drop the old table

3) rename the new table to the old name

4) Recreate all indexes and constraints

4th step is taking about 12 minutes.

blog: http://sarveshsingh.com Twitter: @sarveshsing

Stefan_G SSCertifiable Points: 6609 More actions · Answer 4

ss-457805 (5/19/2010)

I followed the below method:

1) Use SELECT INTO to create a new table with the correct content by joining the two involved tables.

2) Drop the old table

3) rename the new table to the old name

4) Recreate all indexes and constraints

4th step is taking about 12 minutes.

In that case it sounds like this would be the best approach:

1) Disable indexes where the key is directly affected by the update

2) Perform the update in several batches of about 2 million rows each

3) rebuild the disabled indexes

ss-457805 SSCertifiable Points: 5872 More actions · Answer 5

Disable indexes where the key is directly affected by the update

Looking at the execution plan The update is hitting the following indexes:

1. pk_productionaudit

2. ind_productionaudit_assettime

3.ind_productionaudit_assetname

i can disable the nonclustered indexes..What about the clustered index.I can't disable that cos then the update will fail. I'll disable the nonclustered indexes and see how long that takes?

blog: http://sarveshsingh.com Twitter: @sarveshsing

Stefan_G SSCertifiable Points: 6609 More actions · Answer 6

Stefan_G

SSCertifiable

Points: 6609

May 19, 2010 at 7:40 am

#1169427

Dont worry about the clustered index. It should not be disabled.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

Stefan_G (5/19/2010)

Dont worry about the clustered index. It should not be disabled.

Just a thought... I haven't looked all through this thread for column names and the like but if the clustered index columns are being updated, there could be massive page splits involved in the update. In such a case, disabling the clustered index would keep that huge amount of disk overhead from occuring.

I still like your other plan of using SELECT INTO (with, perhaps, an on the fly update included) and a rename at the end. The 4th step should take any longer on one table or the other.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ss-457805 SSCertifiable Points: 5872 More actions · Answer 8

ss-457805

SSCertifiable

Points: 5872

May 19, 2010 at 8:47 am

#1169483

Hi Jeff

In such a case, disabling the clustered index would keep that huge amount of disk overhead from occuring.

By disabling the clustered index, won't the update statement error. If i disable the clustered index i won't be able to access underlying table data.

blog: http://sarveshsingh.com Twitter: @sarveshsing

Stefan_G SSCertifiable Points: 6609 More actions · Answer 9

ss-457805 (5/19/2010)

Hi Jeff

In such a case, disabling the clustered index would keep that huge amount of disk overhead from occuring.

By disabling the clustered index, won't the update statement error. If i disable the clustered index i won't be able to access underlying table data.

This update does not modify the clustered key. In this case you do not need to worry about disabling the clustered index.

If you had been updating the clustered key, Jeff is absolutely correct that it might be a good idea to also drop the clustered index. Note that I said drop rather than disable. If you disable a clustered index, the table cannot be accessed at all, as you have already seen.

But in this case just dont worry about about the clustered index.

ss-457805 SSCertifiable Points: 5872 More actions · Answer 10

After testing more i found that the method Select Into.. is quicker than disabling index and running update.

But i am following Select into for another update like below it takes 2 hours on 2 million rows.

SELECT p.auditId ,

p.vFrom ,

p.vTo ,

p.assetId ,

p.availability ,

p.status ,

p.opMode ,

p.qtyIn ,

p.qtyOut ,

p.qtyProcessed ,

p.qtyRejected ,

p.countUnitId ,

p.rate ,

p.shiftId ,

p.runId ,

p.productId ,

p.crewId ,

p.crewSize ,

p.stopEventRefId ,

p.rejectEventRefId ,

p.xnCode ,

p.version ,

p.shiftAuditId ,

p.cellAssetId ,

--p.assetname ,

--p.assetdesc ,

--p.assetRunCostPerHour ,

--p.assettype ,

p.assetname ,

p.assetdesc ,

p.assetRunCostPerHour,

p.assettype ,

p.countUnitDesc ,

p.shiftName ,

p.shiftDesc ,

p.runname ,

p.productName ,

p.productDesc ,

p.productCountUnitId ,

p.productCountUnitDesc ,

p.materialCost ,

p.crewName ,

p.crewCostPerHourPerHead ,

p.cellAssetName ,

p.cellAssetDesc ,

sh.auditid as lastStatusChangeAuditId

INTO dbo.New2

FROM dbo.productionAudit p

Left JOIN statushistory sh on p.assetid = sh.assetid

AND p.vFrom >= sh.vFrom

AND p.vTo <= sh.vTo

Why is this? Please help.

blog: http://sarveshsingh.com Twitter: @sarveshsing

Stefan_G SSCertifiable Points: 6609 More actions · Answer 11

Stefan_G

SSCertifiable

Points: 6609

May 21, 2010 at 5:22 am

#1170721

If you post the execution plan, table definitions, and index definitions it becomes much easier to help... 😉

ss-457805 SSCertifiable Points: 5872 More actions · Answer 12

ss-457805

SSCertifiable

Points: 5872

May 21, 2010 at 10:31 am

#1170954

hi Stefan

Please find attached the table and index definations and executionplan.

blog: http://sarveshsingh.com Twitter: @sarveshsing

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 13

I thought I'd "bump" this one for the OP since he provided everything requested but hasn't gotten a reply yet. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ss-457805 SSCertifiable Points: 5872 More actions · Answer 14

ss-457805

SSCertifiable

Points: 5872

May 23, 2010 at 12:18 pm

#1171318

i am struggling with this one.. I can't understand y the estimated number of rows is more than 7 millions.. When the total number of rows is 7 million for productionaudit..

Any ideas?

blog: http://sarveshsingh.com Twitter: @sarveshsing

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 15

ss-457805 (5/23/2010)

i am struggling with this one.. I can't understand y the estimated number of rows is more than 7 millions.. When the total number of rows is 7 million for productionaudit..

Any ideas?

Yes... like I said... "accidental cross join" in the form of a many-to-many join. Any chance of you saving the execution plan as a "real" execution plan so I can load it up is SSMS to have a peek?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions