Erratic SP Performance

Question

Post reply

Erratic SP Performance

Dave Mason

SSCrazy

Points: 2741
More actions
February 27, 2012 at 9:04 am

#257493

One of our Sql Servers is experiencing sluggish performance. I've been monitoring performance primarily by running sql traces. It seems the slowdowns are occurring during peak usage during business hours. One of the traces I run monitors for stored procs and other sql batches that take 100 milliseconds or more to run. (Most of the interaction on the server is via stored procs.)

The numbers below are from an SP that takes a single input parameter and joins three tables. I chose it because it was one of the "simpler" SP's and is executed frequently throughout the day. Since it showed up often in my 100+ ms trace, I started another trace, filtering on just the one SP by name and nothing else.

I'm wondering what's going on within the server when it takes 388 ms to run, versus the other times when it runs in 2 ms or less. Also, sometimes CPU cycles are zero, while other times it is in the teens.

Duration
SPID milliseconds Reads Writes CPU 71 2 110 0 0 71 2 110 0 16 65 2 110 0 15 71 3 110 0 16 71 1 107 0 0 71 1 107 0 0 71 1 107 0 0 71 388 107 0 0 71 2 107 0 0 58 388 110 0 0 58 0 113 0 0 58 0 113 0 0 51 0 107 0 0 72 0 110 0 0 72 0 113 0 0

Dave MasonSeminole County, FL

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

You might be running into either blocking/locking issues, or into parallelism issues.

The easiest way to narrow down what to target is to track wait states on the server. You can do that through DMVs (search "sql server wait states" in Bing/Google), or you can get a third party tool that will help narrow it down for you. I use Confio Ignite for that, but Foglight and a number of other products work equally well.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Dave Mason SSCrazy Points: 2741 More actions · Answer 2

I've been looking at sys.dm_os_wait_stats and cross-referencing http://blogs.msdn.com/b/psssql/archive/2009/11/03/the-sql-server-wait-type-repository.aspx. I see quite a few actions that include "You should be able to safely ignore".

Here are the wait stats from the server. Is there an easy/better way to post tabular data in these forums?



wait_type	waiting_tasks_count	wait_time_ms	max_wait_time_ms	signal_wait_time_ms	AvgWait

FT_IFTS_SCHEDULER_IDLE_WAIT	3244	262706362	600389	16179	80982

CLR_MANUAL_EVENT	11966	178184613	30517	885	14890

LAZYWRITER_SLEEP	137316	175445643	17475797	409273	1277

CLR_AUTO_EVENT	375	174022378	85032427	10	464059

ONDEMAND_TASK_QUEUE	7	158888628	150295552	0	22698375

FT_IFTSHC_MUTEX	8	157860912	148950452	1	19732614

SLEEP_TASK	6847127	131406228	9919	86491802	19

DISPATCHER_QUEUE_SEMAPHORE	6372	107133014	34798331	3067	16813

CHECKPOINT_QUEUE	585	103819560	65125210	48	177469

LOGMGR_QUEUE	16905	89012915	5394088	99396	5265

XE_TIMER_EVENT	2961	88951075	30442	88951075	30040

SQLTRACE_INCREMENTAL_FLUSH_SLEEP	21907	88630190	12783	36	4045

REQUEST_FOR_DEADLOCK_SEARCH	17528	88425670	5563	88425670	5044

BROKER_TO_FLUSH	40391	44161086	2113	128569	1093

BROKER_EVENTHANDLER	4	39029346	35369049	1	9757336

CXPACKET	5392734	24985058	708513	2324853	4

OLEDB	62530	9428626	91568	0	150

TRACEWRITE	7222	9269526	2535	3869	1283

PAGEIOLATCH_SH	928477	5022670	1185	21853	5

LATCH_EX	489693	4050073	1163	485634	8

BROKER_TASK_STOP	722	3670249	10265	278	5083

SOS_SCHEDULER_YIELD	1220551	2455068	1210	1155217	2

BROKER_RECEIVE_WAITFOR	2	1198840	598839	0	599420

ASYNC_NETWORK_IO	10415	268161	896	3341	25

IO_COMPLETION	36058	193070	572	524	5

WRITELOG	20644	165025	572	4607	7

BACKUPIO	257590	63504	919	2458	0

SLEEP_BPOOL_FLUSH	6188	29369	417	733	4

BACKUPTHREAD	3167	27184	598	51	8

LATCH_SH	4499	25546	532	167	5

LCK_M_SCH_M	13	22242	5397	0	1710

PREEMPTIVE_OS_FILEOPS	1849	20956	533	0	11

PAGEIOLATCH_EX	2190	20799	1939	9	9

BACKUPBUFFER	5653	17070	291	271	3

LCK_M_X	2	13160	11546	0	6580

PREEMPTIVE_OS_AUTHENTICATIONOPS	10704	12897	561	0	1

MSQL_DQ	4	12019	10218	0	3004

ASYNC_IO_COMPLETION	92	10422	2289	2	113

MSQL_XP	1254	6283	396	0	5

PAGELATCH_UP	9899	6244	537	255	0

PREEMPTIVE_OS_GETPROCADDRESS	1254	5968	396	0	4

PREEMPTIVE_OS_GETFILEATTRIBUTES	1611	5676	397	0	3

SQLTRACE_FILE_READ_IO_COMPLETION	172	4406	397	0	25

PREEMPTIVE_OS_DECRYPTMESSAGE	2060	3822	385	0	1

PAGELATCH_EX	70510	3285	421	529	0

LOGBUFFER	100	3100	884	0	31

PREEMPTIVE_OS_LOOKUPACCOUNTSID	1369	2563	387	0	1

PAGEIOLATCH_UP	834	2436	601	1	2

LCK_M_IX	41	2416	367	2	58

PREEMPTIVE_OS_DELETESECURITYCONTEXT	2132	2064	538	0	0

PREEMPTIVE_OS_CREATEFILE	1147	2041	390	0	1

PREEMPTIVE_OS_CRYPTACQUIRECONTEXT	1928	1709	389	0	0

PAGELATCH_SH	10517	1313	359	490	0

SQLTRACE_FILE_BUFFER	16	1291	406	2	80

THREADPOOL	41	1205	386	0	29

PREEMPTIVE_OS_QUERYREGISTRY	1466	1193	3	0	0

CMEMTHREAD	13340	1054	387	584	0

PREEMPTIVE_OS_DTCOPS	1	1043	1043	0	1043

PREEMPTIVE_OS_WAITFORSINGLEOBJECT	1423	901	300	0	0

PREEMPTIVE_OS_SETFILEVALIDDATA	357	842	517	0	2

DTC	7	804	392	2	114

EXECSYNC	2146	780	297	144	0

PREEMPTIVE_OS_GETVOLUMEPATHNAME	644	734	290	0	1

PREEMPTIVE_OS_CRYPTIMPORTKEY	1706	498	366	0	0

PREEMPTIVE_OS_REPORTEVENT	213	477	389	0	2

PREEMPTIVE_OS_GENERICOPS	48	418	395	0	8

PREEMPTIVE_OS_AUTHORIZATIONOPS	2412	277	9	0	0

SQLTRACE_FILE_WRITE_IO_COMPLETION	1068	143	30	1	0

CXROWSET_SYNC	2334	132	2	38	0

PREEMPTIVE_OS_DISCONNECTNAMEDPIPE	1238	119	8	0	0

IMPPROV_IOWAIT	26	115	23	0	4

LCK_M_SCH_S	1	77	77	0	77

SOS_RESERVEDMEMBLOCKLIST	245	68	4	33	0

PREEMPTIVE_OS_REVERTTOSELF	1663	45	5	0	0

QUERY_EXECUTION_INDEX_SORT_EVENT_OPEN	28	37	3	6	1

PREEMPTIVE_OS_GETDISKFREESPACE	357	31	3	0	0

CLR_CRST	81	11	1	6	0

PREEMPTIVE_COM_QUERYINTERFACE	2	10	10	0	5

PREEMPTIVE_DTC_ENLIST	7	8	2	0	1

FT_IFTS_RWLOCK	101	6	1	3	0

PREEMPTIVE_TRANSIMPORT	7	5	1	0	0

SNI_CRITICAL_SECTION	55	5	1	1	0

RESOURCE_SEMAPHORE_MUTEX	18	3	1	1	0

SOS_DISPATCHER_MUTEX	16	2	1	2	0

PREEMPTIVE_DTC_BEGINTRANSACTION	7	2	1	0	0

SOS_LOCALALLOCATORLIST	2	2	1	1	1

PREEMPTIVE_CREATEPARAM	1	2	2	0	2

PREEMPTIVE_OS_MOVEFILE	6	2	1	0	0

PREEMPTIVE_OLEDBOPS	9	2	1	0	0

WRITE_COMPLETION	6	1	1	0	0

PREEMPTIVE_DTC_COMMITREQUESTDONE	7	1	1	0	0

PREEMPTIVE_DTC_PREPAREREQUESTDONE	7	1	1	0	0

PREEMPTIVE_OS_FLUSHFILEBUFFERS	1	1	1	0	1

TRANSACTION_MUTEX	6	1	1	0	0

Dave MasonSeminole County, FL

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 3

Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.

See if you can track, when that query runs slower than usual, what wait type that query is experiencing.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eric M Russell SSC Guru Points: 125255 More actions · Answer 4

Eric M Russell

SSC Guru

Points: 125255

February 27, 2012 at 12:30 pm

#1452249

Have you traced the SP:Recompile event? If so, then is there a correlation between an increase in recompiles to some activity in the database, like following the reload of tables?

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Dave Mason SSCrazy Points: 2741 More actions · Answer 5

GilaMonster (2/27/2012)

Track the per-process waits (sys.dm_os_waiting_tasks), the cumulative is fairly useless as a single snapshot, especially with all the ignorable waits included.

See if you can track, when that query runs slower than usual, what wait type that query is experiencing.

Thanks for the advice, Gail. I had noticed CXPACKET waits previously when I was looking at the output of dm_os_wait_stats. But when I did the math, the average wait was really low.

I discovered some occasional CXPACKET waits when I looked at sys.dm_os_waiting_tasks. The wait durations varied. some were insignificant, some were significant, others were ridiculously high.

In the past I had tried varying levels of MAXDOP, but impact wasn't discernible. I decided to change MAXDOP again and I set it to 2, based on this from MS (and the fact that hyper-threading is enabled on the server): http://support.microsoft.com/kb/329204

I don't see any improvement in performance, though.

Dave MasonSeminole County, FL

Brian.cs Default port Points: 1407 More actions · Answer 6

Brian.cs

Default port

Points: 1407

February 27, 2012 at 6:52 pm

#1452348

This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:

1. Is this being run by a .NET application?

2. What types of arguments does the stored proc take?

3. How close are you to being at 80% or more memory utilization on the SQL box?

Dave Mason SSCrazy Points: 2741 More actions · Answer 7

Brian.cs (2/27/2012)

This smells like somethings we recently ran into, we actually had 3 things going on and solving both fixed it for us. Three questions:

1. Is this being run by a .NET application?

2. What types of arguments does the stored proc take?

3. How close are you to being at 80% or more memory utilization on the SQL box?

1. Often, but not always.

2. The problem is happening with many SP's. The one I've referenced takes a single input param--an INT.

3. The physical server currently shows roughly 57% memory utilization.

Dave MasonSeminole County, FL

Dave Ballantyne SSC-Dedicated Points: 33556 More actions · Answer 8

Since the reads / writes and Cpu are inline , i dont think this is a parameter sniffing or recompile issue.

Could be blocking ??

Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.

Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.

Clear Sky SQL
My Blog[/url]

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 9

I suspect this is blocking or latch wait-related. CXPacket would knock the CPU up as well as the duration.

Personally I'd leave MAXDOP alone without some significant analysis first.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Dave Mason SSCrazy Points: 2741 More actions · Answer 10

Dave Ballantyne (2/28/2012)

Could also be a slow down on the client consuming the data. The duration includes time taken to send the data to the client , if it doesnt move fast enough you get an ASYNC_NETWORK_IO wait. These are shown in your snapshot , but hard to tell if that is natural for your system or not.

I considered this too. I didn't see many occurences of ASYNC_NETWORK_IO when I repeatedly queried sys.dm_os_waiting_tasks. If I did, the wait duration was so low I disregarded it.

Also you mention traces, server side traces right ? You are not using profiler to monitor a live system are you.

Correct, server side traces only.

Dave MasonSeminole County, FL

Brian.cs Default port Points: 1407 More actions · Answer 11

The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspx

That was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.

Dave Mason SSCrazy Points: 2741 More actions · Answer 12

Brian.cs (2/28/2012)

The reason I was asking about ASP.NET has to do with a bug in transaction scope, and if you don't know how to look at it then it's really hard to realize is happening: http://blogs.msdn.com/b/dbrowne/archive/2010/06/03/using-new-transactionscope-considered-harmful.aspx

That was a big part of our issue. Can you run SQL Profiler, filter based on the application that's running the procs, and see if any of the connections are being set to transaction isolation Serializable? It's a shot in the dark, but in our apps it caused behavior similar to what you're describing.

Hmmm. Very interesting. I know that the performance issues are prevalent in the apps (as opposed to executing an offending SP in an SSMS window). I also know that the apps do use MSDTC, which requires a specific configuration on the server. From your experience, maybe you can help me with a couple of questions related to the article on blogs.msdn.com:

1. If our apps aren't using ideal settings for transaction isolation level and/or transaction timeout, shouldn't I be noticing frequent timeouts and/or deadlocks? That's not happening on our system.

2. Does the TransactionScope issue apply to all versions of the .Net Framework? (Our apps are compiled for v 2.0)

I've never run a trace to check a connection for its trx isolation level. It should be easy to figure out, though. I'll give it a try. Thx for the tip! 🙂

UPDATE: According to sys.dm_exec_requests, the transaction_isolation_level is 2 (ReadCommitted). The scenario from the article does not appear to apply to our situation.

Dave MasonSeminole County, FL

Dave Mason SSCrazy Points: 2741 More actions · Answer 13

One final update (for now)...

I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.

Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁

Dave MasonSeminole County, FL

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

Dave Mason (2/28/2012)

One final update (for now)...

I rebooted the server during a maintenance window Monday night. Performance has resumed to normal. Users are no longer complaining, I can no longer duplicate the previous performance issues manually, and I can no longer see evidence of poor performance via traces.

Good news/bad news, right? Performance is great :-), but I haven't found a permanent/long term solution to the problem. Eventually it will rear its ugly head again. 🙁

With that thought in mind and taking a shot in the dark, what are the initial size and growth settings of TempDB?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions