Eratic behaviour

  • Can anyone help with this one?

    I have spent the past two days looking into performance issues on a major e-commerce site - it got passed to me once it had been identified that the issue appears to be with SQL Server. I have created several traces using SQL Profiler (TSQL_Duration template and extra columns), which revealed that one query in particular was taking time to complete - so far so good.

    We then used NetIQ AppManager to run this query against the database every minute. The time for this query to complete has varied between 1.49 secs to nearly 10 minutes. This would appear to be completely random.

    Exhaustive analysis of the trace has not revealed any other significant activity at the times of poor performance.

    This problem appears to be server wide and not related to one table, index, or even database.

    There is no correlation between this performance and CPU, or memory usage, though there may be high disk I/O.

    The number of page splits per sec is low

    Any thoughts appreciated.

    This is a SQL 2000 Cluster on Win2k - all patched up to date.

    Thanks

    Tim

  • I would suspect that you don't have enough memory and a lot of large queries that need different data. If this is the case then the cache hit ratio will remain low (90% or less consistently), the cache is where SQL gets the major portion of it's speed. Could also be the query plans are flushing to quick and thus having to recompile (look at recompiles). Also, check the execution plan a few times and see if it changes for some reason, could be an index needs to be refreshed or stats need to be.

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • I'd recommend looking at the query plans. Possibly the optimizer is making bad calls (which could just be bad stats as Antares suggested). You might try adding with recompile to force it to get a new plan each time. Not the ideal solution, but worth exploring. Have you checked for blocking?

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • Antares and Andy

    Thanks for the replies.

    I too thought that SQL memory could be the issue here, but an average of 99.85 % for the Buffer Cache Hit Ratio speaks for itself.

    All the databases have the auto create statistics and update statistics on - and I have created a new Maintenance Plan for all the databases that updates the statistics used by query optimiser every morning. This ran last night - no change this morning.

    I am currently looking at the execution plans - nothing so far

    I am also looking into index fragmentation

    I'll update this if I find anything further. In the meantime any other guidance appreciated.

    Thanks again

    Tim

  • Quick update:

    Execution plan for the query is identical every time - and no variation in the subquery costs or the cost of individual operators within each query

    T

  • Another update

    DBCC SHOWCONTIG on the indexes for operations with greatest cost showed Scan Density of nearly 100%. I have found one or two indexes that are fragmented - but do not believe this to be the issue.

    Tim

  • Solart,

    Thanks - I hadn't considered the caveats of Autoshrink.

    However just checked and, in this case, autoshrink is is set to false on all the user databases

    Performance problem is still there..and I'm still working on it, any other suggestions welcome

    Tim

  • Hi Tim,

    Did you check the network. A while ago we had performance problems that where caused by a corrupt network card on the server.

    Klaas-Jan

  • Two thoughts. One is that identical query plans may not be a good thing? The other is whether or not you've confirmed there is no blocking? Make that three - are you monitoring the server to see if there is something else going on at the same time that is hindering performance? Backup, DTS package, some exe that runs on the server?

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • Andy

    I am curious - why do you say

    "identical query plans may not be a good thing?"

    Tx

    Tim

  • Klaas-Jan

    We are seeing this variaation in query response time when the query is run locally on the server - this rules out any network issue.

    Thanks for the reply

    Tim

  • Same query plan may not be appropriate depending on the data distribution. You could add the recompile switch to see what happens. You'll take a small perf hit each time but if this is it, your slow query would now (hopefully) run better. Not a common scenario.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • Looks like we may have found the cause.

    Initially we could not find any blocking - however this morning we have used OSQL to run the stored procedure in question 10 times simultaneously - each in a new window.

    We noticed that there appeared to be a delay before the command started in all but the first window. We managed (eventually) to capture the current activity whilst the query was running.

    This revealed blocking by another process that had an UPDATE with a Wait Type of LCK_M_IX and this in turn was blocked by a process running the stored procedure.

    Net result - the stored procedure can only complete one at a time.

    We are now looking into why the update is called.

    Can any one think of a better way to capture blocking other than the Current Activity window in Enterprise Manager

    Thanks to all

    Tim

  • Glad you figured it out. For blocking, look at a cursor that queries against sysprocesses to see what is blocked/blocking. Might have something in the script library already.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • In answer to my own question above, I recieved a mail form MS support yesterday. The two Q articles are very informative - and I am now running the stored proceedure sp_blocker_pss80 against the live server. If it does prove to be useful I'll add it to the scripts library

    Tim

    From MS support:

    1) Run the following in Query Analyser:

    EXEC master..sp_who2

    Check the results for blocking, if you do see a lot of blocking then please read through the Technet Articles below and capture more information

    INF: Understanding and Resolving SQL Server 7.0 or 2000 Blocking

    Problems(Q224453)

    (http://support.microsoft.com/default.aspx?scid=kb;en-us;Q224453)

    INF: How to Monitor SQL Server 2000 Blocking(Q271509)

    (http://support.microsoft.com/default.aspx?scid=kb;en-us;Q271509)

    2) Please run both a Performance Monitor Trace and a Profiler Trace for a couple of hours during the time when performance is bad. Please capture all the Profiler events and data columns as listed in Q224587 and at least all of the instances and all counters for the following Performance Monitor objects:

    Memory, Physical Disk, Process, Processor, Thread, Server, System and ALL the SQL Server Objects - (SQLServer: Access Methods, SQLServer: Buffer Manager, SQLServer: Cache Manager, SQLServer: Databases,

    SQLServer: General Statistics, SQLServer: Locks, SQLServer: Memory Manager and SQLServer: SQL Statistics)

    INF: Troubleshooting Application Performance with SQL Server(Q224587)

    (http://support.microsoft.com/default.aspx?scid=kb;en-us;Q224587)

    3) Instead of restarting the server (which would effect several possible

    causes) to resolve the problem please could you run the following command in Query Analyser:

    DBCC FREEPROCCACHE

    This will purge the procedure cache and force recompilation of all the stored procedure execution plans. If this resolves the problem then it will indicate that the problem is caused by out of date execution plans which can then be resolved by regular maintenance of indexes and statistics.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply