Statistics IO - Physical Reads in Prod vs Dev

  • Attaching the text file to this post to make accessing it easier than going via rapidshare.

    When you attach the graphical plan, be sure to upload it as an attachment to your post. The file you upload should be named *.sqlplan - you can obtain it by right-clicking on the graphical plan, and choosing Save As. Just being clear that an image of the plan would not be helpful!

  • I dont think the physical reads are the problem. The query in production runs for over 4 hours. During this time only about 30000 physical reads are performed. There is no way this amount of physical reads can keep the disk system busy for 4 hours. Even a very basic system could perform 30000 random reads in about 5 minutes.

    The query is definitely not IO-bound - it must be CPU bound.

    I dont know why the query runs so much slower in production than in dev, but you should focus on understanding what the CPU is doing. Are you sure that there are no other CPU-intensive tasks running at the same time ?

    It is also possible to speed up the query a lot just by rewriting it.

    User-defined functions cause lots of performance problems in SQL server. It is possible to rewrite the query without functions - like this:

    UPDATE CustomerAdditionalFields

    SET Store_Loyalty = (select top 1 StoreNumber

    FROM dbo._df_proc_active_stores

    WHERE StoreNumber <> '931' --exclude the internet

    ORDER BY

    POWER(SIN(( customer_latitude*0.017453 - GEO_LATITUDE*0.017453 ) / 2), 2) +

    COS(customer_latitude*0.017453) * COS(GEO_LATITUDE*0.017453) *

    POWER(SIN(( customer_longitude*0.017453 - GEO_LONGITUDE*0.017453 ) / 2), 2)

    )

    FROM CustomerAdditionalFields CAF

    INNER JOIN CustomerLoyaltyStore CLS ON CAF.customer_key = CLS.customer_key

    WHERE RollingMonth = 36

    AND CLS.RollingMonthRank = 1

    AND ISNULL(customer_latitude, 0) <> 0

    AND ISNULL(customer_longitude, 0) <> 0

    This rewrite speeds up the query about 5 times in my tests !

    It is also possible to increase speed even more by using a CLR procedure, or an SSIS package with a script task.

    I have actually tried rewriting it as a CLR procedure. It increased the speed another 5 times in my tests. A total improvement of 25 times !!

    When performing the query as a CLR procedure it is difficult to make it multi-threaded. To exploit the many CPU:s in the production machine you might want to use an SSIS-package since it is very simple to make SSIS packages multithreaded.

    Good luck

    /SG

  • Attached are the graphical execution plans for both prod and dev

  • Check fragmentation on the tables. Read somewhere that fragmentation can affect the ability to read ahead.

  • Kevin,

    For brevity, I am going to assume that you know all about the evils of T-SQL scalar functions, those that do data access in particular.

    Nevertheless, I would join Stefan in encouraging you to write a set-based solution that can take advantage of SQL Server parallelism (no scalar functions!)

    For the time being, I am also going to assume that you cannot make any immediate code changes whatsoever, and simply want production performance to match development performance, as far as possible.

    To that end, my strong hunch is that you (or someone) either have a trace running on production, or the query is executed with SET STATISTICS IO/TIME ON. One particularly evil thing about T-SQL scalar functions is that they generate trace output for every row processed.

    Turn off all traces, and ensure those SET options are OFF. Rejoice in the improved performance, then set about developing and testing a set-based solution, like Stefan's, perhaps taking advantage of the built-in RADIANS function too. I sometimes think I am the only one that uses it.

    Remember, all T-SQL functions (except in-line) are inherently evil, and must be avoided at all costs.

    edit: spelling, again

  • Kevin Bullen (3/23/2010)


    Attached are the graphical execution plans for both prod and dev

    Just curious: How many different customers do you have, and how many stores do you have ?

  • Here's another vote for turning this into a set-based query, eliminating the scalar function, and turning off those options.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Just for fun I created an SSIS package that performs the processing using a C# script component. The package runs two separate threads to get some parallellism. It is easy to extend to more parallell threads if desired.

    I have used SSIS 2008 so it is probably not directly usable on 2005, but it might give you some ideas.

    The performance is very good. This package is 11 times faster than the set-oriented SQL code I posted before.

    /SG

  • Thanks all for the input. I ripped out the UDF and called the logic inline, mostly hacking from Stefan_G's sql example. That took the runtime from 10 hours to 2.

    I have sworn off scalar functions at this point.

    Stefan_G - pretty cool SSIS package.

    So with help from you all, I shaved 8 hours from a sql job.

    I'm still going to poke around a bit and understand why it runs differently in dev and prod.

    Thanks again.

    Kevin

  • You're welcome.

    If you are still interested in help, please post the new execution plans. There might be something more that we can see to help out.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Kevin Bullen (3/25/2010)


    I'm still going to poke around a bit and understand why it runs differently in dev and prod.

    Trace.

  • Kevin Bullen (3/25/2010)


    Thanks all for the input. I ripped out the UDF and called the logic inline, mostly hacking from Stefan_G's sql example. That took the runtime from 10 hours to 2.

    I have sworn off scalar functions at this point.

    Stefan_G - pretty cool SSIS package.

    So with help from you all, I shaved 8 hours from a sql job.

    And if you decide to use the SSIS package you will probably gain another factor 10, so the time should go down from 2 hours to 12 minutes.

    🙂

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply