How should I diagnose a memory bottleneck in 64-bit?

  • My mind is mush after reading numerous papers on monitoring memory. Most of what I found was in regards to what DMVs and perfmon/sysmon counters to check, but very little information on what values are considered bad. So I ask you, what do you check when you suspect a memory bottleneck and does this apply to 64-bit OS? I've always concentrated on paging, but I know that is too simplistic.

    Thanks, Dave

  • Dave,

    First, what is your servers hardware and software configuration? I see you say it is x64, but how much memory does it have, is it a dedicated SQL server, what version OS is running on it? Do you have the Max Server Memory set for SQL?

    Then what is making you think that you have a memory bottleneck? The counter values for x64 are basically the same for x86 and IA64 as to what you should be monitoring and what should be considered acceptable. Have a look at the PAL tool on codeplex:

    http://www.codeplex.com/PAL

    It can be used to read your counter sets and point out possible problem areas. There is a SQL Server specific template that has the counters and threshold limits defined in it. This was developed by Microsoft so I trust the values contained in its defaults as good baseline numbers.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • SQL Server is clustered. Each server contains (4) dual core CPUs with 32GB of memory and a GBit NIC. The OS is Windows Server 2003 EE SP2. Pagefile is 10GB. The server is dedicated to SQL, however it does contain the following common applications.

    TREND - Anti-virus (not real-time)

    CSA - Cisco Security Agent

    Double-Take - Replication to Disaster Recovery server

    Min Server Memory: 2GB

    Max Server Memory: 26GB

    I believe a memory bottleneck may exist because there are a lot of sustained page reads and writes against the pagefile. I will take a look at the PAL tool, but I'm still curious what perfmon/sysmon counters I should concentrate on and more importantly what values indicate a memory bottleneck.

    In regards to the Pagefile I've always believed you don't need a large pagefile, because with SQL Server if you are seeing a lot of paging and your pagefile grows to several GB you most likely need to add memory. I've had Microsoft engineer's tell me the same thing indicating the only reason I would want a large pagefile is to capture a memory dump. However, I did find KB 889654 which has the following quote.

    Note If you are running Microsoft SQL Server together with MSDTC (COM+), the pagefile should be at least 1 percent larger than how much RAM is installed in the system. For example, when you are using 32 GB of RAM, the pagefile should be at least 32.32 GB of RAM.

    Thanks, Dave

  • What is the average value of the Memory: Available MB's on the server? If it is higher than 150MB then I don't think that you have a Memory bottleneck so much, but that is not to say that you don't have a problem necessarily. I'd have to ask some other MVP's regarding the page file activity that you are seeing specifically to find out more information as this isn't an area that I am overly familiar with, especially with clustering involved. My page files are always 1.5 times the physical memory as configured by our Server Team, I don't change what they set on my servers. Someone else on here probably can answer this as well, but in case, I'll send an email to a few people and see what they say about it.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • One other thing:

    Is there a specific performance issue that you are having, or are you looking to understand why the counters for the page file are what they are information wise?

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Available MBytes is 2,622. As I am typing this the pagefile use showed Page Ins up to 10k, although these are spikes that last typically a few seconds and then spike again after a few seconds. Pages found in RAM is currentlly 65% and Read Hit Ratio is 0%.

    There are loads of performance problems we are troubleshooting, most caused by poor code. The environment won't be live for another 6 weeks, but I see constant CXPACKET and PageLatch wait times. High disk queuing and Disk Sec/write, indicating a disk bottleneck on our SAN and lately I have seen constant read/write to the pagefile.

  • DBADave (12/30/2008)


    TREND - Anti-virus (not real-time)

    CSA - Cisco Security Agent

    Double-Take - Replication to Disaster Recovery server

    great, bound to be a few memory leak issues with that little lot installed.

    you dont mention the version of sql server, IIRC the problems you are experiencing are common with SQL2005 Standard x64

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Dave,

    When identifying memory as a bottleneck, I'd first want to confirm that memory pressure is occurring within the SQL Server engine. Memory pressure is when SQL Server needs to contend with one or more processes for system memory.

    A couple performance monitor counters I use when trying to identify memory pressure are located within SQLServer:Buffer Manager.

    Buffer cache hit ratio - This is the percentage of requested pages found in memory without having to incur a read from disk. You want this to be above 95%, and in most cases in the 98-99.9% area.

    Page Life Expectancy - This is how long a page remains in memory before getting paged out. The longer the better. If this value is very low it can indicate memory pressure. Ideally you want this value to be above 300 (5 minutes).

    Both of these counters having low values indicates memory pressure on the system, which is commonly caused by not having enough memory, other applications consuming memory, or an incorrect SQL Server memory setting.

    Additionally, what edition of SQL Server are you using?

  • I'm running SQL 2005 Enterprise - 64bit - SP2

  • DBADave (12/29/2008)


    My mind is mush after reading numerous papers on monitoring memory. Most of what I found was in regards to what DMVs and perfmon/sysmon counters to check, but very little information on what values are considered bad. So I ask you, what do you check when you suspect a memory bottleneck and does this apply to 64-bit OS? I've always concentrated on paging, but I know that is too simplistic.

    Thanks, Dave

    use following script :

    DBCC MEMORYSTATUS

    ------------------

    --Host wise

    select hostname,count(spid) as Total_Connection,sum(isnull(memusage,0)) as MemUsage from sys.sysprocesses

    group by hostname order by count(spid) desc

    ----------

    --Database wise

    select db_name(dbid) as DatabaseName,count(spid) as Total_Connection,sum(isnull(memusage,0)) as MemUsage from sys.sysprocesses

    group by db_name(dbid) order by count(spid) desc

    --------

    select

    a.name,

    connections = (select

    count(*)

    from

    master..sysprocesses b

    where

    a.dbid = b.dbid),

    blocked_users = (select

    count(*)

    from

    master..sysprocesses b

    where

    a.dbid = b.dbid and

    blocked <> 0),

    total_memory = isnull((select sum(memusage)

    from

    master..sysprocesses b

    where

    a.dbid = b.dbid),0),

    total_io = isnull((select sum(physical_io)

    from

    master..sysprocesses b

    where

    a.dbid = b.dbid),0),

    total_cpu = isnull((select sum(cpu)

    from

    master..sysprocesses b

    where

    a.dbid = b.dbid),0),

    total_waittime = isnull((select sum(waittime)

    from

    master..sysprocesses b

    where

    a.dbid = b.dbid),0),

    dbccs = isnull((select count(*)

    from

    master..sysprocesses b

    where

    a.dbid = b.dbid and

    upper(b.cmd) like '%DBCC%'),0),

    bcp_running = isnull((select count(*)

    from

    master..sysprocesses b

    where

    a.dbid = b.dbid and

    upper(b.cmd) like '%BCP%'),0),

    backup_restore_running = isnull((select count(*)

    from

    master..sysprocesses b

    where

    a.dbid = b.dbid and

    upper(b.cmd) like '%BACKUP%' or

    upper(b.cmd) like '%RESTORE%'),0)

    from

    master.dbo.sysdatabases a

    -----------------------------------

    select DB = a.instance_name,

    'DBCC Logical Scans' = a.cntr_value,

    'Transactions/sec' = (select d.cntr_value

    from

    master..sysperfinfo d

    where

    d.object_name = a.object_name and

    d.instance_name = a.instance_name and

    d.counter_name = 'Transactions/sec'),

    'Active Transactions' = (select case when i.cntr_value < 0 then 0

    else i.cntr_value end

    from

    master..sysperfinfo i

    where

    i.object_name = a.object_name and

    i.instance_name = a.instance_name and

    i.counter_name = 'Active Transactions'),

    'Bulk Copy Rows' = (select b.cntr_value

    from

    master..sysperfinfo b

    where

    b.object_name = a.object_name and

    b.instance_name = a.instance_name and

    b.counter_name = 'Bulk Copy Rows/sec'),

    'Bulk Copy Throughput'= (select c.cntr_value

    from

    master..sysperfinfo c

    where

    c.object_name = a.object_name and

    c.instance_name = a.instance_name and

    c.counter_name = 'Bulk Copy Throughput/sec'),

    'Log Cache Reads' = (select e.cntr_value

    from

    master..sysperfinfo e

    where

    e.object_name = a.object_name and

    e.instance_name = a.instance_name and

    e.counter_name = 'Log Cache Reads/sec'),

    'Log Flushes' = (select f.cntr_value

    from

    master..sysperfinfo f

    where

    f.object_name = a.object_name and

    f.instance_name = a.instance_name and

    f.counter_name = 'Log Flushes/sec'),

    'Log Growths' = (select g.cntr_value

    from

    master..sysperfinfo g

    where

    g.object_name = a.object_name and

    g.instance_name = a.instance_name and

    g.counter_name = 'Log Growths'),

    'Log Shrinks' = (select h.cntr_value

    from

    master..sysperfinfo h

    where

    h.object_name = a.object_name and

    h.instance_name = a.instance_name and

    h.counter_name = 'Log Shrinks')

    from

    master..sysperfinfo a

    where

    a.object_name like '%Databases%'

    _____________________________________________________________________________________________________________
    Paresh Prajapati
    ➡ +919924626601
    http://paresh-sqldba.blogspot.com/[/url]
    LinkedIn | Tweet Me | FaceBook | Brijj

  • Thanks Paresh, but what values are considered bad and point to a memory bottleneck.

    Dave

  • A couple of observations. That KB article is a bit strange. A well-tuned SQL Server should not page significantly. You may see some ticks in the paging counters, but that is because the backup APIs look like paging to the OS. Try it sometime and you will see the paging counters increment while a SQL backup runs.

    Strictly speaking, you can build a cluster without MSDTC, but you won't get very far. It is almost impossible to install a SQL service pack without MSDTC, so everyone just adds it in anyway.

    Note that Windows 2008 has some new options on MSDTC and is a much better platform for clustering SQL than Windows 2003.

    Personally, I point to DoubleTake. Even though the block-level replication system used by DoubleTake is asynchronous, you still have extra I/O operations and of course memory pressure.

    So, on to solutions. A larger paging file won't help. Paging cached data is self-defeating. You may need to lower the upper memory boundary on SQL an d see if that helps. Keep an eye on Page Life Expectancy so you don't squeeze cache too much. Consider rebuilding the cluster on Windows 2008 as it handles complex, large memory configurations much better than Windows 2003. Note that you cannot migrate a Windows 2003 cluster to Windows 2008 in-place. There are too many differences.

  • I also suspect Double-Take and have been lowering the max memory, but I'm still hopeful to find some memory counters that I can look at on a regular basis to assess memory performance. If you have any counter recommendations along with their corresponding good and bad values that would be appreciated.

    Thanks

  • "but I see constant CXPACKET and PageLatch wait times"

    ...check the paralellism settings. Maybe think about changing it to '1'.

    CXPACKET is an indication that the server is waiting for queries that have been broken up to run in parallel on different CPUs/Cores

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply