Help with VERY slow update

  • Thanks in advance to those with knowledge and willingness to help.

    This problem has been baffling me for days.

    I have a simple update like

    UPDATE t

    SET

    col1 = t2.col,

    col2 = t3.col,

    col3 = t4.col,

    col4 = t5.col,

    col5 = t6.col,

    col6 = t7.col

    FROM

    <table 1> t

    INNER JOIN <table 2> t2 ON

    ...

    LEFT JOIN <table 3> t3 ON

    ...

    LEFT JOIN <table 4> t4 ON

    ...

    LEFT JOIN <table 5> t5 ON

    ...

    LEFT JOIN <table 6> t6 ON

    ...

    LEFT JOIN <table 7> t7 ON

    ...

    The table <table 1> has 5M rows.

    The tables <table 2> thru <table 7> have between 1M and 30M rows. Indexes are all created and tables had "UPDATE STATISTICS" run.

    The above update IS VERY SLOW (It takes hours).

    Just to compare when I run the following update

    UPDATE t

    SET

    col1 = 1,

    col2 = 'xxx',

    col3 = 1,

    col4 = 'xxx'

    col5 = NULL,

    col6 = NULL

    FROM

    <table 1> t

    INNER JOIN <table 2> t2 ON

    ...

    LEFT JOIN <table 3> t3 ON

    ...

    LEFT JOIN <table 4> t4 ON

    ...

    LEFT JOIN <table 5> t5 ON

    ...

    LEFT JOIN <table 6> t6 ON

    ...

    LEFT JOIN <table 7> t7 ON

    ...

    thus replacing column references with scalar values the update takes 30 seconds.

    Notice: the only difference between the 2 updates are the SET statements, all the rest including FROM and JOINS are the same.

    I observed that the execution plan for the 1st update uses many levels of parallelism while the 2nd update is one level.

    All great ideas would be highly appreciated.

    Alexander.

  • It seems like the indexes available can't be used in the first statement since the indexes are not covering indexes (e.g. an index for table2 should not only include the columns used as join predicates but also -preferred as included column- the column t2.col. The same applies to the rest of the tables.

    But without seeing the actual update statement including table def and index def for all tables involved it's hard to tell.

    The parallelism you noticed most probably are caused by index scans. We would need the actual execution plan to analyze any further.

    There are several options to improve the update (including the divide'n'conquer approach) but as mentioned we'd need more details.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • check your join criteria as well... any chance something could be invalid, like JOIN Table2 T2 t1.ID = t1.ID (when it should be t1.id = t2.id?)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hi,

    I think you should check your joining condition.

    The columns which you are using in joining condition are all indexed or you are joining on non indexed column????

    One more thing, Are you going from parent table to child table while joining?????

    Thanks,

    Roshan

  • How fast does it return results when you replace the update with a select ?

    Are there Nulls in the columns on which your joining the tables.

    Jayanth Kurup[/url]

  • Hi,

    I think you should check your joining condition.

    The columns which you are using in joining condition are all indexed or you are joining on non indexed column????

    One more thing, Are you going from parent table to child table while joining?????

    Thanks,

    Roshan

  • How fast does it return results when you replace the update with a select ?

    Are there Nulls in the columns on which your joining the tables.

    When I run COUNT (*) using the FROM condition it returns in 30 sec.

  • Obviously, COUNT(*) will return u the result faster as it just counts number of rows.

    When using count(*) you are just joining the tables & counting number of rows.

    While in update you are selecting values from each table & depending on your condition you are updating those values.

    Of-course this action will take more time than just counting number of rows.

  • roshan.zanwar (7/7/2011)


    Obviously, COUNT(*) will return u the result faster as it just counts number of rows.

    When using count(*) you are just joining the tables & counting number of rows.

    While in update you are selecting values from each table & depending on your condition you are updating those values.

    Of-course this action will take more time than just counting number of rows.

    Please note: the 2 updates I am referring to in my original post have the same FROM and JOIN conditions. The only difference are the SET columns: the 1st one uses columns from the tables used in FROM clause, the other one (for time comparison) uses scalar values.

    The 1st one runs in hours, the 2nd in minutes.

    Any great ideas?

    Alexander.

  • With LEFT JOINs if they play no part in the query (i.e. no columns are used in the query) the QP with remove them from the query plan which would explain the runtime difference between Q1 and Q2.

    I would try doing the updates a column/table at a time to see if one of the joins between the tables is causing the problem or the overall complexcity of the number joins in the update.

    Martin Cairns

  • Hi,

    It seems to be Missing index.

    Solutions:Along with your query

    Use Database Engine Tuning Advisor to get more clarity of recommendations .If U feel if it is really worth, Then test in Dev server first then implement in Production.

    Cheers...

  • The fact that the select is returning data faster , the chances are your facing blocking issues during the update.

    Jayanth Kurup[/url]

  • To repeat what I stated earlier:

    It seems like the indexes available can't be used in the first statement since the indexes are not covering indexes (e.g. an index for table2 should not only include the columns used as join predicates but also -preferred as included column- the column t2.col. The same applies to the rest of the tables.

    But without seeing the actual update statement including table def and index def for all tables involved it's hard to tell.

    The parallelism you noticed most probably are caused by index scans. We would need the actual execution plan to analyze any further.

    There are several options to improve the update (including the divide'n'conquer approach) but as mentioned we'd need more details.

    @Shurkadze

    Did you try to apply the index concept I mentioned? Or can you at least provide the actual exectuion plan as an .sqlplan file?

    @G.Lakshmanan

    I strongly vote against using DTA. Most of the "suggestions" I got until I refused to use it where -friendly spoken- "misleading".



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (7/8/2011)


    To repeat what I stated earlier:

    It seems like the indexes available can't be used in the first statement since the indexes are not covering indexes (e.g. an index for table2 should not only include the columns used as join predicates but also -preferred as included column- the column t2.col. The same applies to the rest of the tables.

    But without seeing the actual update statement including table def and index def for all tables involved it's hard to tell.

    The parallelism you noticed most probably are caused by index scans. We would need the actual execution plan to analyze any further.

    There are several options to improve the update (including the divide'n'conquer approach) but as mentioned we'd need more details.

    @Shurkadze

    Did you try to apply the index concept I mentioned? Or can you at least provide the actual exectuion plan as an .sqlplan file?

    @G.Lakshmanan

    I strongly vote against using DTA. Most of the "suggestions" I got until I refused to use it where -friendly spoken- "misleading".

    Please explain: what are covering indexes? How does setting of values to indexed columns (or not) affect the execution time?

    Thanks, Alexander.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply