CPU Usage Reported By Profiler

  • Chris Howarth-536003 (10/7/2009)


    In our case the problem turned out to be a memory leak associated with the sp_OAMethod and the sp_OAGetProperty extended procedures which were being called every few minutes by, ironically enough, our 3rd-party monitoring software. After raising a call with MS, they suggested a fix of applying SP3 CU1, which resolved the problem. Here's a link to the KB article describing this particular issue:

    http://support.microsoft.com/kb/937277

    There are other causes of VAS fragmentation, such as the use of Linked Servers, and the use of sp_xml_preparedocument without a corresponding sp_xml_removedocument, to name a couple. Also, do you have CLR enabled?

    Cheers

    Chris

    Chris,

    How did you determine that sp_OA* was actually producing memory leaks? This isn't a challenge... I'd really like to know because I've said such a thing to many people and haven't been able to prove it. Being able to prove such a thing would make my life a whole lot easier.

    Also, you mentioned that the use of Linked Servers is a cause of VAS fragmentation. How do you identify that such a thing is happening and what is the fix for that? It's quite common to have Linked Servers on systems and knowing such a fix would be huge for me.

    Thank you for your insights.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • rhunt (10/8/2009)


    Eoin (10/8/2009)


    Try running the following

    select name, SUM(single_pages_kb + multi_pages_kb)/1024.0 MBUsed

    from sys.dm_os_memory_clerks

    group by name

    order by 2 desc

    with particular empasis on figures for Tokenandpermuserstore, sql plans and object plans

    There were 300 items. Here are the top 10:

    NAME MB Used

    Default 55.578125

    SOS_Node 48.312500

    SchemaMgr Store 12.093750

    SNIPacket 5.664062

    Bound Trees 2.554687

    TokenAndPermUserStore 2.304687

    SystemRowsetStore 1.093750

    Lock Manager : Node 0 1.093750

    Broker dormant rowsets 0.859375

    Object Plans 0.601562

    Not the TokenAndPermUserStore issue so. Should have realised it wasn't an issue on 32bit system.

  • Jeff Moden (10/8/2009)


    Chris,

    How did you determine that sp_OA* was actually producing memory leaks? This isn't a challenge... I'd really like to know because I've said such a thing to many people and haven't been able to prove it. Being able to prove such a thing would make my life a whole lot easier.

    Also, you mentioned that the use of Linked Servers is a cause of VAS fragmentation. How do you identify that such a thing is happening and what is the fix for that? It's quite common to have Linked Servers on systems and knowing such a fix would be huge for me.

    Thank you for your insights.

    Hi Jeff

    Our 3rd-party monitoring software executes the sp_OA extended procs regularly (approx once every 5-10 seconds), so over the course of a week they are executed many thousands of times. Each execution caused such a small leak that it was difficult to spot on a busy server, and it was only when we started to monitor the daily mean VAS fragmentation and availability over a period of a couple of weeks (using the above query and a Reporting Services report) that we spotted a steady decline. We eventually pinned the problem down to the monitoring app's queries as we had to rebuild the machine that it resides on, during which time no drops in memory levels were witnessed.

    To verify that the problem was being caused by the sp_OA extended procs I obtained the relevant code that was being executed by the monitoring app, using SQL Profiler, then executed it thousands of times in a loop (on a test server) whilst logging the memory stats after each iteration, again using the query that I provided earlier in this thread. During execution of the loop there was a very rapid decline in the 'total' and 'maximum' 'free' values, to the point that the instance complained of 'Insufficient Resources' and then generated a dump file before stopping. Fairly conclusive evidence I thought.

    As for Linked Servers, we use them extensively but have never experienced a problem with them. However this is one of the things that Microsoft was interested in when examining our case, suggesting that we modify the Registry to force Linked Server queries to run out of process, we didn't get that far though. I'm not sure whether you've come across it before but there's a CSS article here that goes into detail:

    http://blogs.msdn.com/psssql/archive/2009/09/22/if-you-use-linked-server-queries-you-need-to-read-this.aspx

    Cheers

    Chris

  • Outstanding. Thanks for taking the time to write all of that up, Chris. This is definitely a "keeper" for me.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • rhunt (10/8/2009)


    Thanks. I'm actually not at all familiar with VAS fragmentation so I'll have to research the topic in general. Thanks for the tip. Answers to some of your questions:

    1) We're not using any extended stored procedures in our custom code. However, not sure if SharePoint makes use of them...I'd have to research

    2) We're on SP3 (original...no pose CUME's)

    3) We do make use of linked servers. The linked server with the most activity is to our AS400. SSIS does a lot of work across that link each night.

    4) Turns out the /3Gb switch was never enabled on this server...just PAE and AWE.

    5) Here are the results of your query...although I'm not entirely sure how I should use these results:

    TYPE MIN MAX AVG Blk Count Total

    Commit 4096 84672512 48258 10609 511979520

    Reserve 4096 4157440 183412 8179 1500131328

    Free 4096 3145728 224018 604 135307264

    Private 4096 84672512 105801 18208 1926438912

    Mapped 4096 1060864 107751 62 6680576

    Image 4096 24948736 141561 558 78991360

    Looking at these stats the VAS region appears to be heavily fragmented. The total Free space is 135MB, the maximum contiguous chunk is 3MB and there are 604 individual contiguous chunks.

    I don't know the reason why the CPU becomes heavily utilised under these conditions, all I know is that we had a similar problem with similar fragmentation levels and that the problem has gone away now that we have reduced VAS fragmentation.

    I would suggest that you first restart SQL Server then automate the query I provided previously using a SQL Agent job set to run every two minutes, or every minute if your environment will allow. Log the returned data and the time to a table for a period of a couple of weeks then see if you can spot any patterns in behaviour - for instance you may be able to tie a steep drop in value to the execution of a specific SQL Agent job or some other scheduled process. You may want to consider running a server-side trace so that you can at least attempt to correlate any drops to specific code.

    It's never an easy issue to pin down, but the first step is to try and determine when the drops are occurring.

    Chris

  • That is great info Chris, thanks. I will do that and report results back to this thread.

    Thanks. RH

  • No problem.

    Just to show you the kind of behaviour that you might see, I've uploaded an image of a report showing the behaviour of our SQL Server instance when we had the problem last year:

    You can see that there's a definite downward trend - the daily drops in 'maximum' value in the early hours of the morning are due to the full database backups. In this example the CPU usage was high following the drop in the 'maximum' value on the morning of 03/11/2008 - we decided to restart the instance on the morning of 04/11/2008 due to 'Insufficient Resources' errors when performing log backups.

    Good luck with your investigation, it will be interesting to hear how you get on. 🙂

    Cheers

    Chris

  • Just following back up. I had let the server go for a while because service wasn't really being impacted in the environment and I was trying to get a good baseline. Anyway, things did start to get sluggish over the weekend so I restarted SQLServer.exe. Here's my latest free info:

    Type Min Max Avg block count total time

    Free 4096 69074944 1086158 228 247644160 Oct 12 2009 1:54PM

    SQL is humming along nicely. It's averaging about 8% CPU instead of 60%...workload is about the same as usual. While capturing the baseline a couple things jumped out (obviously one is the VAS issue about you mentioned). The other issue is that this server had a very low percentage for SQL Server Plan Cache: Cache Hit Ratio (Total). It was averaging 47% when I restarted SQL on Friday...the number dips ever lower as the issue escalates. Right now this counter is up at 94.5% with the "SQL Plans" sub category bringing down the average since it is sitting at around 68%. So, this is making we wonder if my "SQL Plans" portion of the Total Plan Cache hit ratio is really in the tank when the total is at 47% (I wasn't monitoring the sub categories last week.)

    After some reading, I am pretty sure that the sql executed via sp_executesql gets optimized and stored in the procedure cache - NOT the SQL cache. However, I would love to get confirmation.

    The only weird thing about this server's workload is that our SharePoint app and CRM app both use sp_executesql A LOT. I'm considering changing the paramaterization for our CRM db and our 60 SharePoint DB's from simple to forced. However, if sp_executesql is getting cache hits in the procedure cache (which is currently pegging 99% hit ratio) then this probably won't help me much.

    Anyway...keeping an eye on things...

  • rhunt (10/12/2009)


    Just following back up. I had let the server go for a while because service wasn't really being impacted in the environment and I was trying to get a good baseline. Anyway, things did start to get sluggish over the weekend so I restarted SQLServer.exe. Here's my latest free info:

    Type Min Max Avg block count total time

    Free 4096 69074944 1086158 228 247644160 Oct 12 2009 1:54PM

    SQL is humming along nicely. It's averaging about 8% CPU instead of 60%...workload is about the same as usual. While capturing the baseline a couple things jumped out (obviously one is the VAS issue about you mentioned). The other issue is that this server had a very low percentage for SQL Server Plan Cache: Cache Hit Ratio (Total). It was averaging 47% when I restarted SQL on Friday...the number dips ever lower as the issue escalates. Right now this counter is up at 94.5% with the "SQL Plans" sub category bringing down the average since it is sitting at around 68%. So, this is making we wonder if my "SQL Plans" portion of the Total Plan Cache hit ratio is really in the tank when the total is at 47% (I wasn't monitoring the sub categories last week.)

    After some reading, I am pretty sure that the sql executed via sp_executesql gets optimized and stored in the procedure cache - NOT the SQL cache. However, I would love to get confirmation.

    The only weird thing about this server's workload is that our SharePoint app and CRM app both use sp_executesql A LOT. I'm considering changing the paramaterization for our CRM db and our 60 SharePoint DB's from simple to forced. However, if sp_executesql is getting cache hits in the procedure cache (which is currently pegging 99% hit ratio) then this probably won't help me much.

    Anyway...keeping an eye on things...

    DBCC FREESYSTEMCACHE ('SQL PLANS') - Will free up that particular cache.

    If you run the query I posed earlier in the thread then you can check the actual physical size. 'SQL PLANS' are cached SQL statements or batches that aren't in stored procedures, functions and triggers.

    This includes any dynamic SQL or raw SELECT statements sent to the server.

    As far as I'm aware the 'Objects Plans' cache will hold the compiled plans for stored procedures, functions and triggers.

    Eoin

  • There's an excellent blog post by Jonathan Kehayias that explains VAS reservation in SQL Server:

    http://sqlblog.com/blogs/jonathan_kehayias/archive/2009/07/07/understanding-the-vas-reservation-aka-memtoleave-in-sql-server.aspx

    The detail contained in the post may well assist with your troubleshooting.

    Chris

  • Well, I can at least predict the behavior now. When my largest free memory block dropped below 4mb today, my server activity went wild. I checked the SQL Cache hit ratio (total) and it is at 45%. Individually, the sql cache hit ratio and procedure cache hit ratio are at 10% and 20% respectively. Checking the size of my sql/procedure cache it becomes evident why; my sql cache is currently only 500KB! So, because I've got no space to store plans, 80% of all procedures and 90% of all adhoc SQL are being compiled on the fly. Aha! That is why my processors go crazy even though the workload has not changed.

    Apparently checkpointing each DB and using DBCC DROPCLEANBUFFERS is no help either. I just can't get SQL to allocated more of the 9.85GB of used memory to the procedure cache. Another restart of sqlserve.exe coming soon...

    Thanks. RH

  • At least you can predict the behaviour now, you're one step closer. Now to try and work out why the memory is becoming so fragmented. 🙂

    Have you managed to correlate the drops with specific server activity?

    Do you have CLR enabled? If so you may benefit from migrating to a 64-bit OS and 64-bit SQL Server where the VAS limitation is removed. But before migrating, an alternative that may give you sufficient breathing space would be to increase the amount of reserved memory by using the '-g' startup switch, I'd suggest trying -g512 as a starting point if you're running only SQL Server on the machine. You'll need to restart SQL Server when you've added the startup switch. Of course the only way you'll know if the situation has improved is to monitor the fragmentation and free space.

    I'd keep investigating the cause of the fragmentation though.

    Chris

  • just a side thought (in relation to the buffer cache, working sets, paging etc) I am presuming the SQL run time account have the "lock pages in memory" privledge under the local policy?

    Cheers,

    Carlton..

  • Yes, lock pages all set. If memory serves that was part of enabling AWE.

    Thanks. RH

  • Chris Howarth-536003 (10/14/2009)


    At least you can predict the behaviour now, you're one step closer. Now to try and work out why the memory is becoming so fragmented. 🙂

    Have you managed to correlate the drops with specific server activity?

    Do you have CLR enabled? If so you may benefit from migrating to a 64-bit OS and 64-bit SQL Server where the VAS limitation is removed. But before migrating, an alternative that may give you sufficient breathing space would be to increase the amount of reserved memory by using the '-g' startup switch, I'd suggest trying -g512 as a starting point if you're running only SQL Server on the machine. You'll need to restart SQL Server when you've added the startup switch. Of course the only way you'll know if the situation has improved is to monitor the fragmentation and free space.

    I'd keep investigating the cause of the fragmentation though.

    Chris

    CLR is enabled...but we don't have a single thing using. Funny thing happened in the middle of the night. My largest contiguous free block moved above the 4mb threshold and the server is running smoothly again. I read through the VAS article you sent. There was a query included for reviewing VAS allocations. Not sure I'm getting much from it though...

    type, virtual_memory_committed_kb, multi_pages_kb

    MEMORYCLERK_SQLGENERAL 0 4376

    MEMORYCLERK_SQLBUFFERPOOL 92280 2440

    MEMORYCLERK_SQLOPTIMIZER 0 72

    MEMORYCLERK_SQLUTILITIES 120 0

    MEMORYCLERK_SQLSTORENG 20928 9328

    MEMORYCLERK_SQLSERVICEBROKER 0 192

    MEMORYCLERK_SNI 0 16

    CACHESTORE_STACKFRAMES 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_LOCK_MANAGER 4096 0

    MEMORYCLERK_SOSNODE 0 19616

    CACHESTORE_OBJCP 0 1896

    CACHESTORE_SQLCP 0 1808

    CACHESTORE_PHDR 0 736

    MEMORYCLERK_HOST 0 16

    MEMORYCLERK_SNI 0 16

    CACHESTORE_STACKFRAMES 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    OBJECTSTORE_SNI_PACKET 0 8

    MEMORYCLERK_SOSNODE 0 2800

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply