CPU Usage Reported By Profiler

Question

Post reply

CPU Usage Reported By Profiler

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

Chris Howarth-536003 (10/7/2009)

In our case the problem turned out to be a memory leak associated with the sp_OAMethod and the sp_OAGetProperty extended procedures which were being called every few minutes by, ironically enough, our 3rd-party monitoring software. After raising a call with MS, they suggested a fix of applying SP3 CU1, which resolved the problem. Here's a link to the KB article describing this particular issue:

http://support.microsoft.com/kb/937277

There are other causes of VAS fragmentation, such as the use of Linked Servers, and the use of sp_xml_preparedocument without a corresponding sp_xml_removedocument, to name a couple. Also, do you have CLR enabled?

Cheers

Chris

Chris,

How did you determine that sp_OA* was actually producing memory leaks? This isn't a challenge... I'd really like to know because I've said such a thing to many people and haven't been able to prove it. Being able to prove such a thing would make my life a whole lot easier.

Also, you mentioned that the use of Linked Servers is a cause of VAS fragmentation. How do you identify that such a thing is happening and what is the fix for that? It's quite common to have Linked Servers on systems and knowing such a fix would be huge for me.

Thank you for your insights.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Eoin The Forge SSCrazy Points: 2872 More actions · Answer 2

rhunt (10/8/2009)

Eoin (10/8/2009)

Try running the following

select name, SUM(single_pages_kb + multi_pages_kb)/1024.0 MBUsed

from sys.dm_os_memory_clerks

group by name

order by 2 desc

with particular empasis on figures for Tokenandpermuserstore, sql plans and object plans

There were 300 items. Here are the top 10:

NAME MB Used

Default 55.578125

SOS_Node 48.312500

SchemaMgr Store 12.093750

SNIPacket 5.664062

Bound Trees 2.554687

TokenAndPermUserStore 2.304687

SystemRowsetStore 1.093750

Lock Manager : Node 0 1.093750

Broker dormant rowsets 0.859375

Object Plans 0.601562

Not the TokenAndPermUserStore issue so. Should have realised it wasn't an issue on 32bit system.

Chris Howarth-536003 SSCommitted Points: 1775 More actions · Answer 3

Jeff Moden (10/8/2009)

Chris,

How did you determine that sp_OA* was actually producing memory leaks? This isn't a challenge... I'd really like to know because I've said such a thing to many people and haven't been able to prove it. Being able to prove such a thing would make my life a whole lot easier.

Also, you mentioned that the use of Linked Servers is a cause of VAS fragmentation. How do you identify that such a thing is happening and what is the fix for that? It's quite common to have Linked Servers on systems and knowing such a fix would be huge for me.

Thank you for your insights.

Hi Jeff

Our 3rd-party monitoring software executes the sp_OA extended procs regularly (approx once every 5-10 seconds), so over the course of a week they are executed many thousands of times. Each execution caused such a small leak that it was difficult to spot on a busy server, and it was only when we started to monitor the daily mean VAS fragmentation and availability over a period of a couple of weeks (using the above query and a Reporting Services report) that we spotted a steady decline. We eventually pinned the problem down to the monitoring app's queries as we had to rebuild the machine that it resides on, during which time no drops in memory levels were witnessed.

To verify that the problem was being caused by the sp_OA extended procs I obtained the relevant code that was being executed by the monitoring app, using SQL Profiler, then executed it thousands of times in a loop (on a test server) whilst logging the memory stats after each iteration, again using the query that I provided earlier in this thread. During execution of the loop there was a very rapid decline in the 'total' and 'maximum' 'free' values, to the point that the instance complained of 'Insufficient Resources' and then generated a dump file before stopping. Fairly conclusive evidence I thought.

As for Linked Servers, we use them extensively but have never experienced a problem with them. However this is one of the things that Microsoft was interested in when examining our case, suggesting that we modify the Registry to force Linked Server queries to run out of process, we didn't get that far though. I'm not sure whether you've come across it before but there's a CSS article here that goes into detail:

http://blogs.msdn.com/psssql/archive/2009/09/22/if-you-use-linked-server-queries-you-need-to-read-this.aspx

Cheers

Chris

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

Outstanding. Thanks for taking the time to write all of that up, Chris. This is definitely a "keeper" for me.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Chris Howarth-536003 SSCommitted Points: 1775 More actions · Answer 5

rhunt (10/8/2009)

Thanks. I'm actually not at all familiar with VAS fragmentation so I'll have to research the topic in general. Thanks for the tip. Answers to some of your questions:

1) We're not using any extended stored procedures in our custom code. However, not sure if SharePoint makes use of them...I'd have to research

2) We're on SP3 (original...no pose CUME's)

3) We do make use of linked servers. The linked server with the most activity is to our AS400. SSIS does a lot of work across that link each night.

4) Turns out the /3Gb switch was never enabled on this server...just PAE and AWE.

5) Here are the results of your query...although I'm not entirely sure how I should use these results:

TYPE MIN MAX AVG Blk Count Total

Commit 4096 84672512 48258 10609 511979520

Reserve 4096 4157440 183412 8179 1500131328

Free 4096 3145728 224018 604 135307264

Private 4096 84672512 105801 18208 1926438912

Mapped 4096 1060864 107751 62 6680576

Image 4096 24948736 141561 558 78991360

Looking at these stats the VAS region appears to be heavily fragmented. The total Free space is 135MB, the maximum contiguous chunk is 3MB and there are 604 individual contiguous chunks.

I don't know the reason why the CPU becomes heavily utilised under these conditions, all I know is that we had a similar problem with similar fragmentation levels and that the problem has gone away now that we have reduced VAS fragmentation.

I would suggest that you first restart SQL Server then automate the query I provided previously using a SQL Agent job set to run every two minutes, or every minute if your environment will allow. Log the returned data and the time to a table for a period of a couple of weeks then see if you can spot any patterns in behaviour - for instance you may be able to tie a steep drop in value to the execution of a specific SQL Agent job or some other scheduled process. You may want to consider running a server-side trace so that you can at least attempt to correlate any drops to specific code.

It's never an easy issue to pin down, but the first step is to try and determine when the drops are occurring.

Chris

rhunt SSCertifiable Points: 5868 More actions · Answer 6

rhunt

SSCertifiable

Points: 5868

October 8, 2009 at 9:49 am

#1063555

That is great info Chris, thanks. I will do that and report results back to this thread.

Thanks. RH

Chris Howarth-536003 SSCommitted Points: 1775 More actions · Answer 7

No problem.

Just to show you the kind of behaviour that you might see, I've uploaded an image of a report showing the behaviour of our SQL Server instance when we had the problem last year:

You can see that there's a definite downward trend - the daily drops in 'maximum' value in the early hours of the morning are due to the full database backups. In this example the CPU usage was high following the drop in the 'maximum' value on the morning of 03/11/2008 - we decided to restart the instance on the morning of 04/11/2008 due to 'Insufficient Resources' errors when performing log backups.

Good luck with your investigation, it will be interesting to hear how you get on. 🙂

Cheers

Chris

rhunt SSCertifiable Points: 5868 More actions · Answer 8

Just following back up. I had let the server go for a while because service wasn't really being impacted in the environment and I was trying to get a good baseline. Anyway, things did start to get sluggish over the weekend so I restarted SQLServer.exe. Here's my latest free info:

Type Min Max Avg block count total time

Free 4096 69074944 1086158 228 247644160 Oct 12 2009 1:54PM

SQL is humming along nicely. It's averaging about 8% CPU instead of 60%...workload is about the same as usual. While capturing the baseline a couple things jumped out (obviously one is the VAS issue about you mentioned). The other issue is that this server had a very low percentage for SQL Server Plan Cache: Cache Hit Ratio (Total). It was averaging 47% when I restarted SQL on Friday...the number dips ever lower as the issue escalates. Right now this counter is up at 94.5% with the "SQL Plans" sub category bringing down the average since it is sitting at around 68%. So, this is making we wonder if my "SQL Plans" portion of the Total Plan Cache hit ratio is really in the tank when the total is at 47% (I wasn't monitoring the sub categories last week.)

After some reading, I am pretty sure that the sql executed via sp_executesql gets optimized and stored in the procedure cache - NOT the SQL cache. However, I would love to get confirmation.

The only weird thing about this server's workload is that our SharePoint app and CRM app both use sp_executesql A LOT. I'm considering changing the paramaterization for our CRM db and our 60 SharePoint DB's from simple to forced. However, if sp_executesql is getting cache hits in the procedure cache (which is currently pegging 99% hit ratio) then this probably won't help me much.

Anyway...keeping an eye on things...

Eoin The Forge SSCrazy Points: 2872 More actions · Answer 9

rhunt (10/12/2009)

Just following back up. I had let the server go for a while because service wasn't really being impacted in the environment and I was trying to get a good baseline. Anyway, things did start to get sluggish over the weekend so I restarted SQLServer.exe. Here's my latest free info:

Type Min Max Avg block count total time

Free 4096 69074944 1086158 228 247644160 Oct 12 2009 1:54PM

SQL is humming along nicely. It's averaging about 8% CPU instead of 60%...workload is about the same as usual. While capturing the baseline a couple things jumped out (obviously one is the VAS issue about you mentioned). The other issue is that this server had a very low percentage for SQL Server Plan Cache: Cache Hit Ratio (Total). It was averaging 47% when I restarted SQL on Friday...the number dips ever lower as the issue escalates. Right now this counter is up at 94.5% with the "SQL Plans" sub category bringing down the average since it is sitting at around 68%. So, this is making we wonder if my "SQL Plans" portion of the Total Plan Cache hit ratio is really in the tank when the total is at 47% (I wasn't monitoring the sub categories last week.)

After some reading, I am pretty sure that the sql executed via sp_executesql gets optimized and stored in the procedure cache - NOT the SQL cache. However, I would love to get confirmation.

The only weird thing about this server's workload is that our SharePoint app and CRM app both use sp_executesql A LOT. I'm considering changing the paramaterization for our CRM db and our 60 SharePoint DB's from simple to forced. However, if sp_executesql is getting cache hits in the procedure cache (which is currently pegging 99% hit ratio) then this probably won't help me much.

Anyway...keeping an eye on things...

DBCC FREESYSTEMCACHE ('SQL PLANS') - Will free up that particular cache.

If you run the query I posed earlier in the thread then you can check the actual physical size. 'SQL PLANS' are cached SQL statements or batches that aren't in stored procedures, functions and triggers.

This includes any dynamic SQL or raw SELECT statements sent to the server.

As far as I'm aware the 'Objects Plans' cache will hold the compiled plans for stored procedures, functions and triggers.

Eoin

Chris Howarth-536003 SSCommitted Points: 1775 More actions · Answer 10

Chris Howarth-536003

SSCommitted

Points: 1775

October 13, 2009 at 3:17 am

#1065097

There's an excellent blog post by Jonathan Kehayias that explains VAS reservation in SQL Server:

http://sqlblog.com/blogs/jonathan_kehayias/archive/2009/07/07/understanding-the-vas-reservation-aka-memtoleave-in-sql-server.aspx

The detail contained in the post may well assist with your troubleshooting.

Chris

rhunt SSCertifiable Points: 5868 More actions · Answer 11

Well, I can at least predict the behavior now. When my largest free memory block dropped below 4mb today, my server activity went wild. I checked the SQL Cache hit ratio (total) and it is at 45%. Individually, the sql cache hit ratio and procedure cache hit ratio are at 10% and 20% respectively. Checking the size of my sql/procedure cache it becomes evident why; my sql cache is currently only 500KB! So, because I've got no space to store plans, 80% of all procedures and 90% of all adhoc SQL are being compiled on the fly. Aha! That is why my processors go crazy even though the workload has not changed.

Apparently checkpointing each DB and using DBCC DROPCLEANBUFFERS is no help either. I just can't get SQL to allocated more of the 9.85GB of used memory to the procedure cache. Another restart of sqlserve.exe coming soon...

Thanks. RH

Chris Howarth-536003 SSCommitted Points: 1775 More actions · Answer 12

At least you can predict the behaviour now, you're one step closer. Now to try and work out why the memory is becoming so fragmented. 🙂

Have you managed to correlate the drops with specific server activity?

Do you have CLR enabled? If so you may benefit from migrating to a 64-bit OS and 64-bit SQL Server where the VAS limitation is removed. But before migrating, an alternative that may give you sufficient breathing space would be to increase the amount of reserved memory by using the '-g' startup switch, I'd suggest trying -g512 as a starting point if you're running only SQL Server on the machine. You'll need to restart SQL Server when you've added the startup switch. Of course the only way you'll know if the situation has improved is to monitor the fragmentation and free space.

I'd keep investigating the cause of the fragmentation though.

Chris

Carlton Leach SSCarpal Tunnel Points: 4202 More actions · Answer 13

Carlton Leach

SSCarpal Tunnel

Points: 4202

October 14, 2009 at 4:00 pm

#1066029

just a side thought (in relation to the buffer cache, working sets, paging etc) I am presuming the SQL run time account have the "lock pages in memory" privledge under the local policy?

Cheers,

Carlton..

rhunt SSCertifiable Points: 5868 More actions · Answer 14

rhunt

SSCertifiable

Points: 5868

October 15, 2009 at 7:40 am

#1066255

Yes, lock pages all set. If memory serves that was part of enabling AWE.

Thanks. RH

rhunt SSCertifiable Points: 5868 More actions · Answer 15

Chris Howarth-536003 (10/14/2009)

At least you can predict the behaviour now, you're one step closer. Now to try and work out why the memory is becoming so fragmented. 🙂

Have you managed to correlate the drops with specific server activity?

Do you have CLR enabled? If so you may benefit from migrating to a 64-bit OS and 64-bit SQL Server where the VAS limitation is removed. But before migrating, an alternative that may give you sufficient breathing space would be to increase the amount of reserved memory by using the '-g' startup switch, I'd suggest trying -g512 as a starting point if you're running only SQL Server on the machine. You'll need to restart SQL Server when you've added the startup switch. Of course the only way you'll know if the situation has improved is to monitor the fragmentation and free space.

I'd keep investigating the cause of the fragmentation though.

Chris

CLR is enabled...but we don't have a single thing using. Funny thing happened in the middle of the night. My largest contiguous free block moved above the 4mb threshold and the server is running smoothly again. I read through the VAS article you sent. There was a query included for reviewing VAS allocations. Not sure I'm getting much from it though...

type, virtual_memory_committed_kb, multi_pages_kb

MEMORYCLERK_SQLGENERAL 0 4376

MEMORYCLERK_SQLBUFFERPOOL 92280 2440

MEMORYCLERK_SQLOPTIMIZER 0 72

MEMORYCLERK_SQLUTILITIES 120 0

MEMORYCLERK_SQLSTORENG 20928 9328

MEMORYCLERK_SQLSERVICEBROKER 0 192

MEMORYCLERK_SNI 0 16

CACHESTORE_STACKFRAMES 0 8

OBJECTSTORE_SNI_PACKET 0 8

OBJECTSTORE_LOCK_MANAGER 4096 0

MEMORYCLERK_SOSNODE 0 19616

CACHESTORE_OBJCP 0 1896

CACHESTORE_SQLCP 0 1808

CACHESTORE_PHDR 0 736

MEMORYCLERK_HOST 0 16

MEMORYCLERK_SNI 0 16

CACHESTORE_STACKFRAMES 0 8

OBJECTSTORE_SNI_PACKET 0 8

MEMORYCLERK_SOSNODE 0 2800