Statistics IO - Physical Reads in Prod vs Dev

Question

Post reply

Statistics IO - Physical Reads in Prod vs Dev

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150457 More actions · Answer 1

Attaching the text file to this post to make accessing it easier than going via rapidshare.

When you attach the graphical plan, be sure to upload it as an attachment to your post. The file you upload should be named *.sqlplan - you can obtain it by right-clicking on the graphical plan, and choosing Save As. Just being clear that an image of the plan would not be helpful!

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Stefan_G SSCertifiable Points: 6609 More actions · Answer 2

I dont think the physical reads are the problem. The query in production runs for over 4 hours. During this time only about 30000 physical reads are performed. There is no way this amount of physical reads can keep the disk system busy for 4 hours. Even a very basic system could perform 30000 random reads in about 5 minutes.

The query is definitely not IO-bound - it must be CPU bound.

I dont know why the query runs so much slower in production than in dev, but you should focus on understanding what the CPU is doing. Are you sure that there are no other CPU-intensive tasks running at the same time ?

It is also possible to speed up the query a lot just by rewriting it.

User-defined functions cause lots of performance problems in SQL server. It is possible to rewrite the query without functions - like this:



UPDATE  CustomerAdditionalFields

SET  Store_Loyalty = (select top 1 StoreNumber

            FROM    dbo._df_proc_active_stores

            WHERE   StoreNumber <> '931' --exclude the internet

            ORDER BY

                POWER(SIN(( customer_latitude*0.017453 - GEO_LATITUDE*0.017453 ) / 2), 2) +

                COS(customer_latitude*0.017453) * COS(GEO_LATITUDE*0.017453) *

                POWER(SIN(( customer_longitude*0.017453 - GEO_LONGITUDE*0.017453 ) / 2), 2)

            )

FROM    CustomerAdditionalFields CAF

        INNER JOIN CustomerLoyaltyStore CLS ON CAF.customer_key = CLS.customer_key

WHERE   RollingMonth = 36

        AND CLS.RollingMonthRank = 1

        AND ISNULL(customer_latitude, 0) <> 0

        AND ISNULL(customer_longitude, 0) <> 0

This rewrite speeds up the query about 5 times in my tests !

It is also possible to increase speed even more by using a CLR procedure, or an SSIS package with a script task.

I have actually tried rewriting it as a CLR procedure. It increased the speed another 5 times in my tests. A total improvement of 25 times !!

When performing the query as a CLR procedure it is difficult to make it multi-threaded. To exploit the many CPU:s in the production machine you might want to use an SSIS-package since it is very simple to make SSIS packages multithreaded.

Good luck

/SG

k_bullen Old Hand Points: 383 More actions · Answer 3

k_bullen

Old Hand

Points: 383

March 23, 2010 at 6:31 am

#1138142

Attached are the graphical execution plans for both prod and dev

roger.price-1150775 Default port Points: 1458 More actions · Answer 4

roger.price-1150775

Default port

Points: 1458

March 23, 2010 at 7:22 am

#1138201

Check fragmentation on the tables. Read somewhere that fragmentation can affect the ability to read ahead.

Paul White SSC Guru Points: 150457 More actions · Answer 5

Kevin,

For brevity, I am going to assume that you know all about the evils of T-SQL scalar functions, those that do data access in particular.

Nevertheless, I would join Stefan in encouraging you to write a set-based solution that can take advantage of SQL Server parallelism (no scalar functions!)

For the time being, I am also going to assume that you cannot make any immediate code changes whatsoever, and simply want production performance to match development performance, as far as possible.

To that end, my strong hunch is that you (or someone) either have a trace running on production, or the query is executed with SET STATISTICS IO/TIME ON. One particularly evil thing about T-SQL scalar functions is that they generate trace output for every row processed.

Turn off all traces, and ensure those SET options are OFF. Rejoice in the improved performance, then set about developing and testing a set-based solution, like Stefan's, perhaps taking advantage of the built-in RADIANS function too. I sometimes think I am the only one that uses it.

Remember, all T-SQL functions (except in-line) are inherently evil, and must be avoided at all costs.

edit: spelling, again

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Stefan_G SSCertifiable Points: 6609 More actions · Answer 6

Stefan_G

SSCertifiable

Points: 6609

March 23, 2010 at 9:47 am

#1138343

Kevin Bullen (3/23/2010)

Attached are the graphical execution plans for both prod and dev

Just curious: How many different customers do you have, and how many stores do you have ?

SQLRNNR SSC Guru Points: 281324 More actions · Answer 7

Here's another vote for turning this into a set-based query, eliminating the scalar function, and turning off those options.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Stefan_G SSCertifiable Points: 6609 More actions · Answer 8

Just for fun I created an SSIS package that performs the processing using a C# script component. The package runs two separate threads to get some parallellism. It is easy to extend to more parallell threads if desired.

I have used SSIS 2008 so it is probably not directly usable on 2005, but it might give you some ideas.

The performance is very good. This package is 11 times faster than the set-oriented SQL code I posted before.

/SG

k_bullen Old Hand Points: 383 More actions · Answer 9

Thanks all for the input. I ripped out the UDF and called the logic inline, mostly hacking from Stefan_G's sql example. That took the runtime from 10 hours to 2.

I have sworn off scalar functions at this point.

Stefan_G - pretty cool SSIS package.

So with help from you all, I shaved 8 hours from a sql job.

I'm still going to poke around a bit and understand why it runs differently in dev and prod.

Thanks again.

Kevin

SQLRNNR SSC Guru Points: 281324 More actions · Answer 10

You're welcome.

If you are still interested in help, please post the new execution plans. There might be something more that we can see to help out.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Paul White SSC Guru Points: 150457 More actions · Answer 11

Paul White

SSC Guru

Points: 150457

March 25, 2010 at 9:54 am

#1139755

Kevin Bullen (3/25/2010)

I'm still going to poke around a bit and understand why it runs differently in dev and prod.

Trace.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Stefan_G SSCertifiable Points: 6609 More actions · Answer 12

Kevin Bullen (3/25/2010)

Thanks all for the input. I ripped out the UDF and called the logic inline, mostly hacking from Stefan_G's sql example. That took the runtime from 10 hours to 2.

I have sworn off scalar functions at this point.

Stefan_G - pretty cool SSIS package.

So with help from you all, I shaved 8 hours from a sql job.

And if you decide to use the SSIS package you will probably gain another factor 10, so the time should go down from 2 hours to 12 minutes.

🙂