Eratic behaviour

Question

Post reply

Eratic behaviour

timwilliams

Ten Centuries

Points: 1007
More actions
October 3, 2002 at 2:30 pm

#158199

Can anyone help with this one?

I have spent the past two days looking into performance issues on a major e-commerce site - it got passed to me once it had been identified that the issue appears to be with SQL Server. I have created several traces using SQL Profiler (TSQL_Duration template and extra columns), which revealed that one query in particular was taking time to complete - so far so good.

We then used NetIQ AppManager to run this query against the database every minute. The time for this query to complete has varied between 1.49 secs to nearly 10 minutes. This would appear to be completely random.

Exhaustive analysis of the trace has not revealed any other significant activity at the times of poor performance.

This problem appears to be server wide and not related to one table, index, or even database.

There is no correlation between this performance and CPU, or memory usage, though there may be high disk I/O.

The number of page splits per sec is low

Any thoughts appreciated.

This is a SQL 2000 Cluster on Win2k - all patched up to date.

Thanks

Tim

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Antares686 SSC Guru Points: 125444 More actions · Answer 1

I would suspect that you don't have enough memory and a lot of large queries that need different data. If this is the case then the cache hit ratio will remain low (90% or less consistently), the cache is where SQL gets the major portion of it's speed. Could also be the query plans are flushing to quick and thus having to recompile (look at recompiles). Also, check the execution plan a few times and see if it changes for some reason, could be an index needs to be refreshed or stats need to be.

"Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

Andy Warren SSC Guru Points: 119747 More actions · Answer 2

I'd recommend looking at the query plans. Possibly the optimizer is making bad calls (which could just be bad stats as Antares suggested). You might try adding with recompile to force it to get a new plan each time. Not the ideal solution, but worth exploring. Have you checked for blocking?

Andy

http://qa.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

timwilliams Ten Centuries Points: 1007 More actions · Answer 3

Antares and Andy

Thanks for the replies.

I too thought that SQL memory could be the issue here, but an average of 99.85 % for the Buffer Cache Hit Ratio speaks for itself.

All the databases have the auto create statistics and update statistics on - and I have created a new Maintenance Plan for all the databases that updates the statistics used by query optimiser every morning. This ran last night - no change this morning.

I am currently looking at the execution plans - nothing so far

I am also looking into index fragmentation

I'll update this if I find anything further. In the meantime any other guidance appreciated.

Thanks again

Tim

timwilliams Ten Centuries Points: 1007 More actions · Answer 4

timwilliams

Ten Centuries

Points: 1007

October 4, 2002 at 7:29 am

#437827

Quick update:

Execution plan for the query is identical every time - and no variation in the subquery costs or the cost of individual operators within each query

T

timwilliams Ten Centuries Points: 1007 More actions · Answer 5

timwilliams

Ten Centuries

Points: 1007

October 4, 2002 at 10:31 am

#437839

Another update

DBCC SHOWCONTIG on the indexes for operations with greatest cost showed Scan Density of nearly 100%. I have found one or two indexes that are fragmented - but do not believe this to be the issue.

Tim

timwilliams Ten Centuries Points: 1007 More actions · Answer 6

timwilliams

Ten Centuries

Points: 1007

October 7, 2002 at 4:32 am

#437934

Solart,

Thanks - I hadn't considered the caveats of Autoshrink.

However just checked and, in this case, autoshrink is is set to false on all the user databases

Performance problem is still there..and I'm still working on it, any other suggestions welcome

Tim

Klaas-Jan SSCrazy Points: 2297 More actions · Answer 7

Klaas-Jan

SSCrazy

Points: 2297

October 7, 2002 at 5:43 am

#437939

Hi Tim,

Did you check the network. A while ago we had performance problems that where caused by a corrupt network card on the server.

Klaas-Jan

Andy Warren SSC Guru Points: 119747 More actions · Answer 8

Two thoughts. One is that identical query plans may not be a good thing? The other is whether or not you've confirmed there is no blocking? Make that three - are you monitoring the server to see if there is something else going on at the same time that is hindering performance? Backup, DTS package, some exe that runs on the server?

Andy

http://qa.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

timwilliams Ten Centuries Points: 1007 More actions · Answer 9

timwilliams

Ten Centuries

Points: 1007

October 7, 2002 at 10:13 am

#437958

Andy

I am curious - why do you say

"identical query plans may not be a good thing?"

Tx

Tim

timwilliams Ten Centuries Points: 1007 More actions · Answer 10

timwilliams

Ten Centuries

Points: 1007

October 7, 2002 at 10:16 am

#437959

Klaas-Jan

We are seeing this variaation in query response time when the query is run locally on the server - this rules out any network issue.

Thanks for the reply

Tim

Andy Warren SSC Guru Points: 119747 More actions · Answer 11

Same query plan may not be appropriate depending on the data distribution. You could add the recompile switch to see what happens. You'll take a small perf hit each time but if this is it, your slow query would now (hopefully) run better. Not a common scenario.

Andy

http://qa.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

timwilliams Ten Centuries Points: 1007 More actions · Answer 12

Looks like we may have found the cause.

Initially we could not find any blocking - however this morning we have used OSQL to run the stored procedure in question 10 times simultaneously - each in a new window.

We noticed that there appeared to be a delay before the command started in all but the first window. We managed (eventually) to capture the current activity whilst the query was running.

This revealed blocking by another process that had an UPDATE with a Wait Type of LCK_M_IX and this in turn was blocked by a process running the stored procedure.

Net result - the stored procedure can only complete one at a time.

We are now looking into why the update is called.

Can any one think of a better way to capture blocking other than the Current Activity window in Enterprise Manager

Thanks to all

Tim

Andy Warren SSC Guru Points: 119747 More actions · Answer 13

Andy Warren

SSC Guru

Points: 119747

October 8, 2002 at 7:26 am

#438046

Glad you figured it out. For blocking, look at a cursor that queries against sysprocesses to see what is blocked/blocking. Might have something in the script library already.

Andy

http://qa.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

timwilliams Ten Centuries Points: 1007 More actions · Answer 14

In answer to my own question above, I recieved a mail form MS support yesterday. The two Q articles are very informative - and I am now running the stored proceedure sp_blocker_pss80 against the live server. If it does prove to be useful I'll add it to the scripts library

Tim

From MS support:

1) Run the following in Query Analyser:

EXEC master..sp_who2

Check the results for blocking, if you do see a lot of blocking then please read through the Technet Articles below and capture more information

INF: Understanding and Resolving SQL Server 7.0 or 2000 Blocking

Problems(Q224453)

(http://support.microsoft.com/default.aspx?scid=kb;en-us;Q224453)

INF: How to Monitor SQL Server 2000 Blocking(Q271509)

(http://support.microsoft.com/default.aspx?scid=kb;en-us;Q271509)

2) Please run both a Performance Monitor Trace and a Profiler Trace for a couple of hours during the time when performance is bad. Please capture all the Profiler events and data columns as listed in Q224587 and at least all of the instances and all counters for the following Performance Monitor objects:

Memory, Physical Disk, Process, Processor, Thread, Server, System and ALL the SQL Server Objects - (SQLServer: Access Methods, SQLServer: Buffer Manager, SQLServer: Cache Manager, SQLServer: Databases,

SQLServer: General Statistics, SQLServer: Locks, SQLServer: Memory Manager and SQLServer: SQL Statistics)

INF: Troubleshooting Application Performance with SQL Server(Q224587)

(http://support.microsoft.com/default.aspx?scid=kb;en-us;Q224587)

3) Instead of restarting the server (which would effect several possible

causes) to resolve the problem please could you run the following command in Query Analyser:

DBCC FREEPROCCACHE

This will purge the procedure cache and force recompilation of all the stored procedure execution plans. If this resolves the problem then it will indicate that the problem is caused by out of date execution plans which can then be resolved by regular maintenance of indexes and statistics.