Obtaining accurate SQL Server performance metrics in a virtual environment

  • We are currently going full throttle - despite skepticism in some quarters - towards a fully virtualized environment for all our SQL Server instances in production, using the VMware ESX technology.

    Apart from concerns about the impact of virtualization on performance, another worry I have is that having SQL Server on a virtual platform will render performance metrics obtained from tools within SQL Server meaningless.

    I'm thinking specifically of 2 examples of this: DBCC MEMORYSTATUS and DMV queries such as this one:

    select

    sum(multi_pages_kb

    + virtual_memory_committed_kb

    + shared_memory_committed_kb) as

    [Overall used w/o BPool, Kb]

    from

    sys.dm_os_memory_clerks

    where

    type <> 'MEMORYCLERK_SQLBUFFERPOOL'

    Is the output of "DBCC MEMORYSTATUS" and that of DMV queries on resource utilization as accurate on a virtual as it is on a physical environment?

    VMware recommends that their own tools be used for performance monitoring instead of SQL native tools, and I find that disturbing. At least this is my take from reading their white papers.

    Any thoughts anyone?

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Interestingly enough, I am working with MS to assess that right now. I have been assured that the DMV's should be accurate in the Virtual environment as they use the time associated with their threads and not the system time (that is basically a quote).

    The experience that I have had has been less than favorable so far having problems getting the IO throughput that we have had on physical and I am still battling with a checkpoint issue that is somewhat IO related due to the volume that we have but was working fairly well on physical.

    The other thing that you have to keep in mind is that VM only promises 80% of physical performance and they can only give you 4 cores to a VM. So, if you have any servers that are beefier than that in the physical world you are going to have to scale down. You can buy a enhanced version of ESX for ~ $2,000 more per box to allow more than 4 cores to a VM but I have to start wondering at what point it becomes more expensive to use VM than just clustering.

    We are still pressing forward with it for our smaller support servers but have ruled it out as an option for our larger production boxes at this point. That may change as we become more accustomed to it though.

    HTH

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Thank you for the useful input.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • David Benoit (5/14/2009)


    Interestingly enough, I am working with MS to assess that right now. I have been assured that the DMV's should be accurate in the Virtual environment as they use the time associated with their threads and not the system time (that is basically a quote).

    The experience that I have had has been less than favorable so far having problems getting the IO throughput that we have had on physical and I am still battling with a checkpoint issue that is somewhat IO related due to the volume that we have but was working fairly well on physical.

    The other thing that you have to keep in mind is that VM only promises 80% of physical performance and they can only give you 4 cores to a VM. So, if you have any servers that are beefier than that in the physical world you are going to have to scale down. You can buy a enhanced version of ESX for ~ $2,000 more per box to allow more than 4 cores to a VM but I have to start wondering at what point it becomes more expensive to use VM than just clustering.

    We are still pressing forward with it for our smaller support servers but have ruled it out as an option for our larger production boxes at this point. That may change as we become more accustomed to it though.

    HTH

    If possible, pls share in this thread what MS has to say about this. I would be very interested to hear their take on it.

    Thank you!

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • This is very interesting. What I've heard is that you can't completely trust the numbers in some things, like CPU and IO because the hypervisor might alter things to balance the load between VMs (or even think it needs to do this if there's only one VM). I haven't heard about memory, and I'd like to.

    I think some of what you'll need to do more is get relative stats for that VM (assuming you haven't changed the host config). I'm not sure you can completely compare stats between VMs, but I'd be curious to know what you see under real world loads.

  • Here is what caught my attention:

    Traditional SQL Server performance monitoring uses the Microsoft Windows performance monitor tool PerfMon to collect statistics. SQL Server integrates with PerfMon to provide familiar counters that indicate system performance. However, as with all measurement tools run in the guest operating system, time-based performance measurements are subject to error. The degree to which the measurements are inaccurate depends on the total load of the ESX host. Always use VirtualCenter, esxtop, or resxtop to measure resource utilization. Depending on the load placed on the ESX host, CPU and memory usage reported within the guest operating system can be different from what ESX reports.

    Taken from this best-practices white paper: "Microsoft SQL Server and VMware Virtual Infrastructure"

    (http://www.vmware.com/resources/techresources/10002)

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Virtualisation is a great tool - however you still need to test your particular environment. Also, not all solutions meet the requirements of every problem - eg, depending on your requirements virtualisation may not be an appropriate deployment.

    This is where a Volume Test environment will show you if your physical design (virtual or physical servers) meets your performance requirements.

    In saying that however, IO throughput should not be an issue in terms of SQL performance. I've deployed many high end SQL and Exchange systems running on VMWare that perform well within requirements and are comparable to physical installations.

    VMWare will provide the IOPS you need. IO problems are more likely to be with poor storage design. Remember, Windows provides an IO thread per volume and VMWare provides an IO thread per Data Store. The way you split up your storage in a virtual world should be similar to how you would do it in a physical world. In a physical world, if you separate disks via LUNs, do the same in a virtual - only this way, they are Data Stores. The way I design storage is to have one volume per partition per windows disk per vmdk. You can share low IO vmdks on the same Data Store (LUN), however for those that require end to end dedicated IO, you give those vmdks a Data Store (LUN) each.

    Also, ensure that your HBA is configured appropriately in terms of Queue Target (LUN or Array) and Queue Depth (32, 64, 128). From memory, Vi3 sets LUN and 64. Anecdotal evidence shows that a Queue Depth of 32 or 128 outperforms 64. Your milage will vary.

    Of course, you don't virtualise on the same hardware as your bare metal physical installation - that provides very little benefit - other than abstracting your hardware platform to enable easy migration to another physical server.

    In terms of performance metrics, time based counters are not accurate in a virtualised environment. This is due to time sharing betwen virtual machines. Some % based counters are also inaccurate in absolute terms. Of course, all counters are relative to the resources you grant your environment.

    As some background, we specify certain rules when we do our physical design. Our environment is based on a cost-recovery (CAPEX only, not TCO) model. Storage is based on GB per Tier and VMs are per GB of RAM. In our model, it is not cost effective to provide a virtualised server instance if it requires either more than 4 vCPUs or 8GB of RAM. You need to apply this as it suits to your environment.

    Also remember that VMWare will perform shared page copy-on-write consolidation (single instance) for memory.

    So, in summary - volume test to find the appropriate physical design. Sometimes its physical, sometimes its virtual. See how it plays out in your cost recovery model, if you have one. IO should not be a problem for performance.

    --
    Andrew Hatfield

  • Andrew Hatfield (5/17/2009)


    Virtualisation is a great tool - however you still need to test your particular environment. Also, not all solutions meet the requirements of every problem - eg, depending on your requirements virtualisation may not be an appropriate deployment.

    This is where a Volume Test environment will show you if your physical design (virtual or physical servers) meets your performance requirements.

    In saying that however, IO throughput should not be an issue in terms of SQL performance. I've deployed many high end SQL and Exchange systems running on VMWare that perform well within requirements and are comparable to physical installations.

    VMWare will provide the IOPS you need. IO problems are more likely to be with poor storage design. Remember, Windows provides an IO thread per volume and VMWare provides an IO thread per Data Store. The way you split up your storage in a virtual world should be similar to how you would do it in a physical world. In a physical world, if you separate disks via LUNs, do the same in a virtual - only this way, they are Data Stores. The way I design storage is to have one volume per partition per windows disk per vmdk. You can share low IO vmdks on the same Data Store (LUN), however for those that require end to end dedicated IO, you give those vmdks a Data Store (LUN) each.

    Also, ensure that your HBA is configured appropriately in terms of Queue Target (LUN or Array) and Queue Depth (32, 64, 128). From memory, Vi3 sets LUN and 64. Anecdotal evidence shows that a Queue Depth of 32 or 128 outperforms 64. Your milage will vary.

    Of course, you don't virtualise on the same hardware as your bare metal physical installation - that provides very little benefit - other than abstracting your hardware platform to enable easy migration to another physical server.

    In terms of performance metrics, time based counters are not accurate in a virtualised environment. This is due to time sharing betwen virtual machines. Some % based counters are also inaccurate in absolute terms. Of course, all counters are relative to the resources you grant your environment.

    As some background, we specify certain rules when we do our physical design. Our environment is based on a cost-recovery (CAPEX only, not TCO) model. Storage is based on GB per Tier and VMs are per GB of RAM. In our model, it is not cost effective to provide a virtualised server instance if it requires either more than 4 vCPUs or 8GB of RAM. You need to apply this as it suits to your environment.

    Also remember that VMWare will perform shared page copy-on-write consolidation (single instance) for memory.

    So, in summary - volume test to find the appropriate physical design. Sometimes its physical, sometimes its virtual. See how it plays out in your cost recovery model, if you have one. IO should not be a problem for performance.

    Thank you for all this info.

    Actually, on the very day that I opened this thread I had a meeting with representatives of Microsoft and VMWare. I posed this question to them and their response was that, provided there was no overcommitment of resources among different SQL instances in a virtual environment, SQL queries of system views should give accurate results on memory/CPU resources. By overcommitment I mean that there should not be any overlap in resource utilization among separate SQL instances.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply