SQL Server has encountered x occurrence(s) of I/O requests taking longer than 15 seconds

  • I need some pointers to the following error that I'm seeing in the error log at random(occurs roughly a dozen times a day). Apologies in advance for the length of the post - I just wanted to provide as much info as possible.

    SQL Server has encountered x occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [t:\mssql\data\temp0.ndf] in database [tempdb].

    This error is repeated for each of the 8 tempdb files that exist on the drive. The drive is a RAID 1+0 array (6 disks) and is dedicated for the tempdb database. The value of x varies between different files and times (sometimes it's 1 and sometimes it's over 1000).

    Now, the error would clearly suggest a problem with the I/O subsystem. However, looking at the disk performance counters when these errors occurred indicate that disk performance isn't an issue (unless I'm missing something ;-)) At the times of the errors the disk reads/sec and the disk writes/sec is often 0 and likewise the disk queue length is 0. Sometimes there is a bit of activity on the disk but no more than 200 reads/sec.

    I should point out that there are times when the disk is under more load and handles over 800 I/O's per sec - which you'd expect from a 6-disk RAID 10 configuration. Also, at times of heavier load the errors never occur.

    I've checked with the person responsible for the SAN and looked at their performance history and nothing would suggest that things are overloaded on the SAN (in fact, things looked pretty quiet as far as they're concerned).

    Previously, the tempdb was on a single file on a RAID 1 array and was moved to the RAID 10 array and split into 8 files to match the 8 cores on the box. The reason tempdb was moved was because the RAID 1 array couldn't cope - curiously though none of these errors occurred even though the RAID 1 array was clearly under too much stress.

    Is there any known relationship between number of tempdb files and this error? I know that a potential CPU drift can cause this error to appear erroneously but I don't think we've got a CPU drift issue - but I can't be certain.

    Allowing for the fact that disk performance issues are ruled out is there anything else I should be looking at for potential causes? Of course, if there are potential disk issues that I have overlooked then feel free to let me know.

    I should mention that there are no noticeable performance issues (i.e. reports of slow running queries or processes) when these errors appear - but it's just annoying me that they're in the error log.

    Thanks.

  • You haven't happend to notice the size of the tempdb files before and after the errors occur, do you? I am curious if file growth while heavy I/O is going on is causing a timeout while a new file size is being allocated.

    Do you know the operations that are going on when the errors are getting logged?

    Joie Andrew
    "Since 1982"

  • Each file is 30GB in size, which in total is probably twice the size that is ever allocated to tempdb so the files don't grow beyond their current size.

    Generally speaking I am able to tell what processes were running at a given point in time because this is a data warehouse so most processes are scheduled ETL jobs or scheduled reports. And that's the frustrating thing I guess. If the errors occurred at the same time each day when a certain job was running then at least I could draw some conclusions from that.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply