Erratic SP Performance

  • One of our Sql Servers is experiencing sluggish performance. I've been monitoring performance primarily by running sql traces. It seems the slowdowns are occurring during peak usage during business hours. One of the traces I run monitors for stored procs and other sql batches that take 100 milliseconds or more to run. (Most of the interaction on the server is via stored procs.)

    The numbers below are from an SP that takes a single input parameter and joins three tables. I chose it because it was one of the "simpler" SP's and is executed frequently throughout the day. Since it showed up often in my 100+ ms trace, I started another trace, filtering on just the one SP by name and nothing else.

    I'm wondering what's going on within the server when it takes 388 ms to run, versus the other times when it runs in 2 ms or less. Also, sometimes CPU cycles are zero, while other times it is in the teens.

    Duration

    SPID milliseconds Reads Writes CPU

    71 2 110 0 0

    71 2 110 0 16

    65 2 110 0 15

    71 3 110 0 16

    71 1 107 0 0

    71 1 107 0 0

    71 1 107 0 0

    71 388 107 0 0

    71 2 107 0 0

    58 388 110 0 0

    58 0 113 0 0

    58 0 113 0 0

    51 0 107 0 0

    72 0 110 0 0

    72 0 113 0 0

  • You might be running into either blocking/locking issues, or into parallelism issues.

    The easiest way to narrow down what to target is to track wait states on the server. You can do that through DMVs (search "sql server wait states" in Bing/Google), or you can get a third party tool that will help narrow it down for you. I use Confio Ignite for that, but Foglight and a number of other products work equally well.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I've been looking at sys.dm_os_wait_stats and cross-referencing http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx. I see quite a few actions that include "You should be able to safely ignore".

    Here are the wait stats from the server. Is there an easy/better way to post tabular data in these forums?

    wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms AvgWait

    FT_IFTS_SCHEDULER_IDLE_WAIT 3244 262706362 600389 16179 80982

    CLR_MANUAL_EVENT 11966 178184613 30517 885 14890

    LAZYWRITER_SLEEP 137316 175445643 17475797 409273 1277

    CLR_AUTO_EVENT 375 174022378 85032427 10 464059

    ONDEMAND_TASK_QUEUE 7 158888628 150295552 0 22698375

    FT_IFTSHC_MUTEX 8 157860912 148950452 1 19732614

    SLEEP_TASK 6847127 131406228 9919 86491802 19

    DISPATCHER_QUEUE_SEMAPHORE 6372 107133014 34798331 3067 16813

    CHECKPOINT_QUEUE 585 103819560 65125210 48 177469

    LOGMGR_QUEUE 16905 89012915 5394088 99396 5265

    XE_TIMER_EVENT 2961 88951075 30442 88951075 30040

    SQLTRACE_INCREMENTAL_FLUSH_SLEEP 21907 88630190 12783 36 4045

    REQUEST_FOR_DEADLOCK_SEARCH 17528 88425670 5563 88425670 5044

    BROKER_TO_FLUSH 40391 44161086 2113 128569 1093

    BROKER_EVENTHANDLER 4 39029346 35369049 1 9757336

    CXPACKET 5392734 24985058 708513 2324853 4

    OLEDB 62530 9428626 91568 0 150

    TRACEWRITE 7222 9269526 2535 3869 1283

    PAGEIOLATCH_SH 928477 5022670 1185 21853 5

    LATCH_EX 489693 4050073 1163 485634 8

    BROKER_TASK_STOP 722 3670249 10265 278 5083

    SOS_SCHEDULER_YIELD 1220551 2455068 1210 1155217 2

    BROKER_RECEIVE_WAITFOR 2 1198840 598839 0 599420

    ASYNC_NETWORK_IO 10415 268161 896 3341 25

    IO_COMPLETION 36058 193070 572 524 5

    WRITELOG 20644 165025 572 4607 7

    BACKUPIO 257590 63504 919 2458 0

    SLEEP_BPOOL_FLUSH 6188 29369 417 733 4

    BACKUPTHREAD 3167 27184 598 51 8

    LATCH_SH 4499 25546 532 167 5

    LCK_M_SCH_M 13 22242 5397 0 1710

    PREEMPTIVE_OS_FILEOPS 1849 20956 533 0 11

    PAGEIOLATCH_EX 2190 20799 1939 9 9

    BACKUPBUFFER 5653 17070 291 271 3

    LCK_M_X 2 13160 11546 0 6580

    PREEMPTIVE_OS_AUTHENTICATIONOPS 10704 12897 561 0 1

    MSQL_DQ 4 12019 10218 0 3004

    ASYNC_IO_COMPLETION 92 10422 2289 2 113

    MSQL_XP 1254 6283 396 0 5

    PAGELATCH_UP 9899 6244 537 255 0

    PREEMPTIVE_OS_GETPROCADDRESS 1254 5968 396 0 4

    PREEMPTIVE_OS_GETFILEATTRIBUTES 1611 5676 397 0 3

    SQLTRACE_FILE_READ_IO_COMPLETION 172 4406 397 0 25

    PREEMPTIVE_OS_DECRYPTMESSAGE 2060 3822 385 0 1

    PAGELATCH_EX 70510 3285 421 529 0

    LOGBUFFER 100 3100 884 0 31

    PREEMPTIVE_OS_LOOKUPACCOUNTSID 1369 2563 387 0 1

    PAGEIOLATCH_UP 834 2436 601 1 2

    LCK_M_IX 41 2416 367 2 58

    PREEMPTIVE_OS_DELETESECURITYCONTEXT 2132 2064 538 0 0

    PREEMPTIVE_OS_CREATEFILE 1147 2041 390 0 1

    PREEMPTIVE_OS_CRYPTACQUIRECONTEXT 1928 1709 389 0 0

    PAGELATCH_SH 10517 1313 359 490 0

    SQLTRACE_FILE_BUFFER 16 1291 406 2 80

    THREADPOOL 41 1205 386 0 29

    PREEMPTIVE_OS_QUERYREGISTRY 1466 1193 3 0 0

    CMEMTHREAD 13340 1054 387 584 0

    PREEMPTIVE_OS_DTCOPS 1 1043 1043 0 1043

    PREEMPTIVE_OS_WAITFORSINGLEOBJECT 1423 901 300 0 0

    PREEMPTIVE_OS_SETFILEVALIDDATA 357 842 517 0 2

    DTC 7 804 392 2 114

    EXECSYNC 2146 780 297 144 0

    PREEMPTIVE_OS_GETVOLUMEPATHNAME 644 734 290 0 1

    PREEMPTIVE_OS_CRYPTIMPORTKEY 1706 498 366 0 0

    PREEMPTIVE_OS_REPORTEVENT 213 477 389 0 2

    PREEMPTIVE_OS_GENERICOPS 48 418 395 0 8

    PREEMPTIVE_OS_AUTHORIZATIONOPS 2412 277 9 0 0

    SQLTRACE_FILE_WRITE_IO_COMPLETION 1068 143 30 1 0

    CXROWSET_SYNC 2334 132 2 38 0

    PREEMPTIVE_OS_DISCONNECTNAMEDPIPE 1238 119 8 0 0

    IMPPROV_IOWAIT 26 115 23 0 4

    LCK_M_SCH_S 1 77 77 0 77

    SOS_RESERVEDMEMBLOCKLIST 245 68 4 33 0

    PREEMPTIVE_OS_REVERTTOSELF 1663 45 5 0 0

    QUERY_EXECUTION_INDEX_SORT_EVENT_OPEN 28 37 3 6 1

    PREEMPTIVE_OS_GETDISKFREESPACE 357 31 3 0 0

    CLR_CRST 81 11 1 6 0

    PREEMPTIVE_COM_QUERYINTERFACE 2 10 10 0 5

    PREEMPTIVE_DTC_ENLIST 7 8 2 0 1

    FT_IFTS_RWLOCK 101 6 1 3 0

    PREEMPTIVE_TRANSIMPORT 7 5 1 0 0

    SNI_CRITICAL_SECTION 55 5 1 1 0

    RESOURCE_SEMAPHORE_MUTEX 18 3 1 1 0

    SOS_DISPATCHER_MUTEX 16 2 1 2 0

    PREEMPTIVE_DTC_BEGINTRANSACTION 7 2 1 0 0

    SOS_LOCALALLOCATORLIST 2 2 1 1 1

    PREEMPTIVE_CREATEPARAM 1 2 2 0 2

    PREEMPTIVE_OS_MOVEFILE 6 2 1 0 0

    PREEMPTIVE_OLEDBOPS 9 2 1 0 0

    WRITE_COMPLETION 6 1 1 0 0

    PREEMPTIVE_DTC_COMMITREQUESTDONE 7 1 1 0 0

    PREEMPTIVE_DTC_PREPAREREQUESTDONE 7 1 1 0 0

    PREEMPTIVE_OS_FLUSHFILEBUFFERS 1 1 1 0 1

    TRANSACTION_MUTEX 6 1 1 0 0

  • Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.

    See if you can track, when that query runs slower than usual, what wait type that query is experiencing.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Have you traced the SP:Recompile event? If so, then is there a correlation between an increase in recompiles to some activity in the database, like following the reload of tables?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • GilaMonster (2/27/2012)


    Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.

    See if you can track, when that query runs slower than usual, what wait type that query is experiencing.

    Thanks for the advice, Gail. I had noticed CXPACKET waits previously when I was looking at the output of dm_os_wait_stats. But when I did the math, the average wait was really low.

    I discovered some occasional CXPACKET waits when I looked at sys.dm_os_waiting_tasks. The wait durations varied. some were insignificant, some were significant, others were ridiculously high.

    In the past I had tried varying levels of MAXDOP, but impact wasn't discernible. I decided to change MAXDOP again and I set it to 2, based on this from MS (and the fact that hyper-threading is enabled on the server): http://support.microsoft.com/kb/329204

    I don't see any improvement in performance, though.

  • This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:

    1. Is this being run by a .NET application?

    2. What types of arguments does the stored proc take?

    3. How close are you to being at 80% or more memory utilization on the SQL box?

  • Brian.cs (2/27/2012)


    This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:

    1. Is this being run by a .NET application?

    2. What types of arguments does the stored proc take?

    3. How close are you to being at 80% or more memory utilization on the SQL box?

    1. Often, but not always.

    2. The problem is happening with many SP's. The one I've referenced takes a single input param--an INT.

    3. The physical server currently shows roughly 57% memory utilization.

  • Since the reads / writes and Cpu are inline , i dont think this is a parameter sniffing or recompile issue.

    Could be blocking ??

    Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.

    Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.



    Clear Sky SQL
    My Blog[/url]

  • I suspect this is blocking or latch wait-related. CXPacket would knock the CPU up as well as the duration.

    Personally I'd leave MAXDOP alone without some significant analysis first.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Dave Ballantyne (2/28/2012)


    Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.

    I considered this too. I didn't see many occurences of ASYNC_NETWORK_IO when I repeatedly queried sys.dm_os_waiting_tasks. If I did, the wait duration was so low I disregarded it.

    Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.

    Correct, server side traces only.

  • The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspx

    That was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.

  • Brian.cs (2/28/2012)


    The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspx

    That was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.

    Hmmm. Very interesting. I know that the performance issues are prevalent in the apps (as opposed to executing an offending SP in an SSMS window). I also know that the apps do use MSDTC, which requires a specific configuration on the server. From your experience, maybe you can help me with a couple of questions related to the article on blogs.msdn.com:

    1. If our apps aren't using ideal settings for transaction isolation level and/or transaction timeout, shouldn't I be noticing frequent timeouts and/or deadlocks? That's not happening on our system.

    2. Does the TransactionScope issue apply to all versions of the .Net Framework? (Our apps are compiled for v 2.0)

    I've never run a trace to check a connection for its trx isolation level. It should be easy to figure out, though. I'll give it a try. Thx for the tip! 🙂

    UPDATE: According to sys.dm_exec_requests, the transaction_isolation_level is 2 (ReadCommitted). The scenario from the article does not appear to apply to our situation.

  • One final update (for now)...

    I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.

    Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁

  • Dave Mason (2/28/2012)


    One final update (for now)...

    I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.

    Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁

    With that thought in mind and taking a shot in the dark, what are the initial size and growth settings of TempDB?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply