Can this procedure be optimized?

  • Hi all,

    I'm rewriting legacy code (DataFlex) as stored procedures for SQL Server. I just wrote the following procedure in a straight forward fashion, which requires several updates to the same table. My primary goal is speed of execution, not elegance. Can this be optimized? If so, can you offer some tips on how to rewrite it? I'm still new to T-SQL, and haven't yet figured out how to control a 'SET' statement with an IF or CASE.

    Any suggestions are welcome.

    ==============================================

    CREATE PROCEDURE rp_SOD_Update_Media

    AS

    -- ~~~~ Calculate & Store avg_sale ~~~~~~~~~~~~~

    UPDATE MASMEDIA

    SET    avg_sale = (ytd_sales / num_orders)      

    WHERE  inactive = 'N'

    AND    num_orders > 0

    AND    (ytd_sales / num_orders) < 10000 ;

    --  --  --  --  --  --  -- --  --  --  --  --  -

    UPDATE MASMEDIA

    SET    avg_sale = 9999.99

    WHERE  inactive = 'N'

    AND    num_orders > 0

    AND    (ytd_sales / num_orders) >= 10000 ;

    -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    -- ~~~~ Calculate & Store avg_profit ~~~~~~~~~~~

    UPDATE MASMEDIA

    SET    avg_profit = (ytd_profit / num_orders)

    WHERE  inactive = 'N'

    AND    num_orders > 0

    AND    (ytd_profit / num_orders) < 10000 ;

    --  --  --  --  --  --  -- --  --  --  --  --  -

    UPDATE MASMEDIA

    SET    avg_profit = 9999.99

    WHERE  inactive = 'N'

    AND    num_orders > 0

    AND    (ytd_profit / num_orders) >= 10000 ;

    -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    -- ~~~~ Calculate & Store percent_sales ~~~~~~~~

    UPDATE MASMEDIA

    SET percent_sales = ((num_orders / qty_mailed) * 100)

    WHERE inactive = 'N'

    AND media_type = 'P'

    AND qty_mailed != 0 --QUESTION: Possible qty_mailed != 0 & still need calc percent_sales?

    AND num_orders != qty_mailed

    AND ((num_orders / qty_mailed) * 100) < 10000 ;

    --  --  --  --  --  --  -- --  --  --  --  --  -

    UPDATE MASMEDIA

    SET percent_sales = 9999.99

    WHERE inactive = 'N'

    AND media_type = 'P'

    AND qty_mailed != 0

    AND num_orders != qty_mailed

    AND (num_orders / qty_mailed) >= 10000 ;

    -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    -- ~~~~ Calculate & Store percent_inqu ~~~~~~~~

    UPDATE MASMEDIA

    SET percent_inqu = ((num_inqu / qty_mailed) * 100)

    WHERE inactive = 'N'

    AND media_type = 'P'

    AND qty_mailed != 0

    AND num_inqu != qty_mailed

    AND ((num_orders / qty_mailed) * 100) < 10000 ;

    --  --  --  --  --  --  -- --  --  --  --  --  -

    UPDATE MASMEDIA

    SET percent_inqu = 9999.99

    WHERE inactive = 'N'

    AND media_type = 'P'

    AND qty_mailed != 0

    AND percent_inqu != qty_mailed

    AND (num_orders / qty_mailed) >= 10000 ;

    -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    --EOF

     

    Any suggestions are welcome,

    Greg

     

     

  • You can easily join updates that have similar where clauses except for a single condition such as the four update statements for avg_sale and avg_profit:

    UPDATE MASMEDIA

    SET    avg_sale = CASE

                                WHEN (ytd_sales / num_orders) < 10000 THEN (ytd_sales / num_orders) 

                                ELSE 9999.99

                             END

            ,avg_profile = CASE

                                   WHEN (ytd_profit / num_orders) < 10000 THEN (ytd_profit / num_orders)

                                ELSE 9999.99

                             END

    WHERE  inactive = 'N'

    AND    num_orders > 0

    It looks like the percent_sales and percent_inqu updates can similarly be combined into a single UPDATE.

    The CASE statement is very useful for combining multiple queries in this way.  Check out Books Online for the full syntax.

    Hope this helps,

    Scott Thornburg

     

  • The CASE statement is very useful, indeed, although I don't think it will increase the performance in this particular case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff,

    Given the information available, it's not possible to say whether or not performance will improve.  This depends on the method of access of the table MASMEDIA.  If the UPDATEs are doing table scans, then the performance should scale by the number of update statements.  Cutting these by a factor of 4 should improve the performance by approximately that same factor.  I've seen this type of situation many times.

    However, without more knowledge of the table counts and indexes, it's difficult to say so for certain.  I pose my suggestion as something for Greg to investigate rather than a guaranteed enhancement.

    On a more basic note, there may be benefit in indexing num_orders, if non-zero orders are rare (less than ~10%).  Also on qty_mailed if non-zero is rare, but you may need to change the != 0 to a >0 as well (unless qty_mailed can be negative).

    Scott Thornburg

     

  • Strange, I posted a reply, where did it go? Here goes again...

    **********************************************************

    Wow, that's exactly the kind of information I'm looking for. I didn't know a CASE statament could be used that way. I tried a SELECT, but went about it backwards, trying to embed SET inside of IF/ELSE clauses.

    I'm learning as I go, of course, and many of the examples I'm fnding in books & online tend to be extremely simplistic. BooksOnline has lots of good info, but their examples would never have led me to the kind of solution that Scott presents here. Can anyone recommend a source for more realistic/advanced examples than the typical 1+1=2 kind?

    Also, are there any rules of thumb about what operations tend to be costly and which tend not to be?

    I really appreciate all the help,

    Greg Norris

  • Scott,

    No offense was meant and I agree except that I've seen the optimizer give the non-Case method the win 2 out of 3 times when testing with and without proper indexes on 3 similar updates.  Use of the CASE statement in this instance may or may not help depending on a lot of conditions that neither of us know.  I guess what I really meant to say in my previous post was that there is no substitute for load testing when performance is a must. 

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • And no offense taken.  It looks like we both agree that when it comes to performance improvements, empirical testing is a must.  There is far more that can impact performance that is outside the simple statement of a query.

    My hope is to provide some options for Greg so he can test them in his environment.

    Thanks much for the clarification.  I do like this community of SQL Server professionals.

     

  • We've seen a lot of performance gain by using a temporary tables

    You could use one in place of MASMEDIA, select the data you want to update into it at the start, update the temporary table and then do a single update to MASMEDIA.

    However, (and it is a big however!), it depends on the size of the dataset that you will be updating compared to the number of rows in the whole table.  This method probably only works where the number of rows being updated is small.

    In our application, we typically need to update a few thousand rows where the table contains millions, so it may not work for you if you are updating a large number of rows.

     

  • I'm surprised to see no mention of indexes, I don't know how an equation in a WHERE clause would affect things, but I would think that making sure the inactive and num_orders fields were indexed would help.

    I like the concept of the temporary tables if your record sets are large enough to benefit from such a technique.  I'll have to remember that one.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • And my sympathy for having been stuck in DataFlex, I've been there.  But I'm a weirdo, I liked Wang PACE and Cobol.  🙂

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • I mentioned indexes (rather, the lack of) in my last reply to Scott.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply