Disk I/O performance issue

  • We are having an issue with our application when we run load test against it. The test will run fine for the first 6 - 10 minutes then there will be a spike in the % Disk Write counter (on the SQL Server machine) and the SQL Server will stop processing requests. Once that starts we see timeouts and then the spike will come back to normal levels and the system will continue processing requests. When this happens we see the following event: Event ID: 833

    "SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [F:\MSSQL\Data\CreditNet.ndf] in database [CreditNet] (5). The OS file handle is 0x00000664. The offset of the latest long I/O is: 0x00000acc740000"

    We have run these tests on four different servers (all different hardware configurations) with three different configurations (SP3, SP4, SQL Server 2005), but the behavior is always the same. The othere issues is that the disk spike happens on the same intreval (although different on each server) i.e. every 8 minutes on Server X the spike happens. The query we are running is an ugly one, but it has not changed for a long time.

    Any ideas? Thanks,

    Keith



    Microsoft Certified Master - SQL Server 2008
    Follow me on twitter: @keith_tate

    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • If your seeing the same issue across so many different hardware and software environments I'd suggest raise a support call rather than spend any more time trying to track down the problem yourselves.

  • Kaith,

    Can you specify a lite bit about what your query is doing ?

    Feel free to send me it if you want.

     

    tks

  • There are a number of queries that are run during this load test, but one of the stored procedures reads an XML document and stores the information in 25+ tables. It is an ugly query, but one that has been unchanged for the last year. This problem has only been around for the last month.

    Here is some stats from Performance Monitor during a load test. The first set is from a snapshot during normal activity:

    Memoryage faults per second          - 173                                            

    %Disk Time - 162

    Current disk queue - 1 (3 spindles)

    Buffer Manager Page Reads/ Writes - 252/5

    Batch requests - 92

    The next set is when the disk % time spikes:

    Memoryage faults per second          - 82

    %Disk Time - 5719

    Current disk queue - 126 (3 spindles)

    Buffer Manager Page Reads/ Writes - 0/330

    Batch requests - 0

    Any help is appreciated.

    Thanks,

    Keith

     



    Microsoft Certified Master - SQL Server 2008
    Follow me on twitter: @keith_tate

    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • The spike is likely related to when sql server is checkpointing the transactionlogs. (fixed interval)

    Has the recovery interval changed from 0?

    from the books online

    Automatic Checkpoints

    SQL Server 2000 always generates automatic checkpoints. The interval between automatic checkpoints is based on the number of records in the log, not time. The time interval between automatic checkpoints can be highly variable. The time interval between automatic checkpoints is long if few modifications are made in the database. Automatic checkpoints occur frequently if a lot of data is modified.

    The interval between automatic checkpoints is calculated from the recovery interval server configuration option. This option specifies the maximum time SQL Server should use to recover a database during a system restart. SQL Server estimates how many log records it can process in the recovery interval during a recovery operation. The interval between automatic checkpoints also depends on whether or not the database is using the simple recovery model.

    • If the database is using either the full or bulk-logged recovery model, an automatic checkpoint is generated whenever the number of log records reaches the number SQL Server estimates it can process during the time specified in the recovery interval option.
    • If the database is using the simple recovery model, an automatic checkpoint is generated whenever the number of log records reaches the lesser of these two values:

      • The log becomes 70 percent full.
      • The number of log records reaches the number SQL Server estimates it can process during the time specified in the recovery interval option.

    Automatic checkpoints truncate the unused portion of the transaction log if the database is using the simple recovery model. The log is not truncated by automatic checkpoints if the database is using the full or bulk-logged recovery models. For more information, see <A href="mk:@MSITStore:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\architec.chm::/8_ar_da2_7vaf.htm">Truncating the Transaction Log.

  • use disk idle time and subtract from 100 - it's more accurate.

    monitor i/o completion time. I'd suspect that the query isn't too good. Note that if you're using xml stuff this runs in process memory so maybe your server is starved of memory too. As your tests seem to always end the same I'd strongly suggest the query !!

    If your tran log and data are on the same disk you're probably swamping the sub system.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Hi Joe,

    I check the recovery interval and it is still set to 0. I changed the value to 32,767 to see how that would affect performance and the issue was all but gone.

    So I was wondering if there is a way to configure that value to something other than 0 in order to get better performance?

    Thanks



    Microsoft Certified Master - SQL Server 2008
    Follow me on twitter: @keith_tate

    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Since the interval is specified in minutes, have you tried 6 min (smaller that our interval, so sql server has to write less transaction data at a time)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply