Disk Queue length

  • I have sql server 2005 on win 2003 and keep on getting this alert for disk queue length on spotlight monitoring tool. How do i reduce this ?

    Messgge: "Disk queue length on D drive is 477.1" this is caused during reads and there are no write activities.

    This disk is SAN configured with RAID5. I always think of keeping database files on multiuple drives hoping to have better performance but my network admin says that it does not matter as it is on SAN, is this true?

  • I had the same discussion with our SAN admin. He says that the I/O is spread over 10 disks anyway, so having different drives on the same SAN wouldn't make a difference, as it basically would be a different section over the same 10 disk array.

    Not sure if your bottleneck is the actual SAN interface on your server. Supposedly the SAN can handle huge IOPS, but more than once I have proven him that I can make his SAN go to its knees 🙂

    We recently installed a replicated server and we're using mount points instead of separate drive letters, and here's where the SAN issue came up. We wanted to have separate mount points for the data/log files. He said that because of the SAN architecture it wouldn't make a difference if it was separate mount points (or drive letters), so we have 1 mount point for all (mdf/ldf). We're now seeing spikes in Queue length on very busy activity times at the Publisher DB.

    _______________________________________________________________________
    For better assistance in answering your questions, click here[/url]

  • Here are some things to check.

    1. open up perfmon, in the main window add the disk queue lenght counters for each individual drive. If one drive (the drive your data and log files are have the highest counters) then get a sampling of them. (make a performance counter and output it to a .cvs file, open that in excel and you take samples over time)

    2. The Average Queue length should be below 2 per disk. If you have 5 disks in your array then the average should be below 10.

    3. Just because the disks have multiple heads and spread the files out over them does not mean they cannot get over loaded. It just means they can get over loaded less quickly and it takes a higher transaction volume.

    Your concerns may be completely valid. Besides raid 5 is best for transaction logs and you want Raid 10 for your data files. It would be worth gathering data to back up your argument, but this may be exactally what your problem is.

    you can also add counters for counter SQL Server: Databases Transactions/ per database, this may help you see what your database files are that are causing your bottle neck.

    could pick and choose what you move if you need to.

    Twitter: @SQLBalls
    Blog: http://www.SQLBalls.com

  • Tara-1044200 (6/1/2010)


    This disk is SAN configured with RAID5. I always think of keeping database files on multiuple drives hoping to have better performance but my network admin says that it does not matter as it is on SAN, is this true?

    Ah, allow me to give you my favorite answer: It Depends. @=)

    There are a few considerations, depending on how your SAN is set up.

    With RAID 5, he's probably right. It doesn't matter because your files are being stripped across multiple disks within the SAN and you're most likely getting the max performance from I/O that you can get, barring the existence of other I/O bottlenecks. (Read up on RAID 5 in Books Online and Google it too. That'll help you understand it).

    However, if you're talking simple physical disks, no RAID setup, and at least on disk controller per disk, you would be correct. Having multiple files on multiple drives would help enhance performance to some extent.

    And if you're talking simple physical disks, no RAID setup, and only one disk controller for the entire SAN (which I was informed is possible) then chances are, separating the database files won't help much, if at all.

    Does this make sense?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • While it's possibly right that high disk queue length is not a problem on the SAN, especially if your users aren't reporting any issues, I'd still say it's worth checking what's happening in the databases at the point the queue goes high--you might find a poorly-optimised query is causing a big table scan, or something of the like.

  • On W2K3, especially if everything's on one LUN (which seems to be the case .... D:?), you might benefit from using the STORPORT driver (http://www.microsoft.com/windowsserversystem/wss2003/techinfo/plandeploy/storportwp.mspx).

    If you can't find any evidence from sources other than Spotlight to corroborate, and you're happy with what your SAN guy says, then you could just increase the threshold at which the alert triggers; Spotlight assumes a single disk (max. queue length <= 2).

  • ...and something else to check with the SAN admin. Other applications might be sharing your disks (luns) as the drives are just virtual anyway. Another application may be causing the disk to show the higher queue length and that's being reported.

    Steve

  • Steve-3_5_7_9 (6/3/2010)


    ...and something else to check with the SAN admin. Other applications might be sharing your disks (luns) as the drives are just virtual anyway. Another application may be causing the disk to show the higher queue length and that's being reported.

    Steve

    Agreed; or you may be sharing with yourself! Data on F:, log on G:, F: is LUN 1, G: is LUN 2... but once you get down to the physical disk level, they're actually sharing the same physical drives.

    I'd suggest scheduling a meeting with the SAN admin. You bring all your information on your databases, their filegroups, and their files. The SAN admin brings all the information on the switches, SANs, trays, spindles/SSDs, their grouping, and their allocation into LUNs.

    The two of you together can then gather a very accurate of which sections of which physical drives each of your databases is using for what.

    That is the starting point for any meaningful discussion on changes you might want to make.

  • As an employee of a major data managment/san storage provider I get asked questions like this quite often, and I really liked the one answer: "It Depends." This is by far the most accurate answer I can start out with in my own discussions.

    As I read through the posts I think most everyone had the correct answers - for any given moment for any specific situation.

    Just to clarify a misstatement.... RAID 1 or 0+1, or 10 is better suited for transaction logs than RAID 5. The write penalty of RAID 5 is (4) IO's per write while the write penalty for RAID 1/0+1/10 is only (2) IO's per write. Since the transaction logs are primarily write oriented, RAID 5 would cost you twice as many IO's as RAID 1/0+1/10

    The type of disk drive in the SAN array would determine the number of IOPs that could be expected out of that disk. 15K RPM Fibre Channel is usually rated around 180 IOPs per disk. 10K RPM Fibre Channel is rated around 150 IOPs, while SATA 7200 RPM is rated around 80 IOPs per disk. Solid State/Enterprise Flash Drives are similar in Write Performance to 15K Fibre Channel disks, but in READ scenarios are capable of 2500 or more READ IOPs.

    In the case of the SAN Array with 10 drives (assuming they are running the highest performance spinning disk @ 15K RPM) there would be available in that array ~1800 IOPs. If the activity is all read then there is no additional RAID write penalty associated with this. This means that a query that performs a huge table scan would have about 1800 IOPs of capacity to satisfy that request. (for table scans - IO Bandwidth becomes an important factor as well, and along these lines IO READ SIZE is critical to determining MAX BANDWIDTH. This is a slightly different discussion but may be relative on TABLE SCAN intensive queries such as Microsoft's Fast Track Data Warehouse reference architecture.)

    So back to the original question where Spotlight triggers an alert at 477 IOs Queued up. - IF - the queue depth is satisfied in a timely manner - such as in a second or whatever seems reasonable to your specific environment - then this is usually considered to be fine. If the queue depth is held in an oversubscribed level for an extended period of time then this would indicate the drives are being asked to work harder and longer then they are physically capable of working, so requests for work are backing up in a queue waiting to be serviced by the disks. There are only two ways that I know of to fix long lines at the grocery store when we checkout - one would be to open up additional checkout lanes (for the SAN admin and the DBA this would mean: get more disks involved in the workload so the requests are satisfied in a timely manner), or two - make it more efficient for the checkers to process each customer more quickly - spending less time on each customer - thereby increasing their efficiency and reducing the time other customers must stand in line waiting to be checked out (translated as: the DBA would need to optimize the queries to use less disk resources - which as we all know is often times needed, but not always a viable option due to application constraints.)

    Relational Database systems are probably the most complex and most difficult of all of the various systems that IT practitioners have to deal with because there is really no such thing as a "typical database". For this reason we as DBA's must think in terms of guidelines and principles, and best practices to direct our thinking towards the right answers for our individual situations at a specific point in time - and the only thing that we can be sure of is that things will change and we'll have to readjust our thinking again.

    I hope this helps to point you in the correct direction with regard to your database/SAN questions. I'm happy to answer additional questions. My name is Randy Loeschner and I work for EMC. I can be reached at my blog: http://www.randyloeschner.com. Cheers!

  • Great stuff Randy. Thanks for the info you shared.

    _______________________________________________________________________
    For better assistance in answering your questions, click here[/url]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply