Critical Error - SQL 2005, Server TCP Provider has stopped listening on port [1433]

  • We're running SQL 2005. Looking at the performance information we appear to have an IO issue. We've been seeing quite a few messages in the event log like this;

    SQL Server has encountered XXX occurrence(s) of I/O requests taking longer that 15 seconds to complete on file ...

    Recently we created a database that is used to log messages from another system (600,000 messages a day) and I suspect that the write activity of this database is causing an issue.

    Our SAN admin tells me that although the drives are busy they are within acceptable ranges. I do see high disk queue lengths though. Also we are running on VM.

    Running this query to see IO stall; WITH DBIO AS

    (

    SELECT

    DB_NAME(IVFS.database_id) AS db,

    CASE WHEN MF.type = 1 THEN 'log' ELSE 'data' END AS file_type,

    SUM(IVFS.num_of_bytes_read + IVFS.num_of_bytes_written) AS io,

    SUM(IVFS.io_stall) AS io_stall

    FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS IVFS

    JOIN sys.master_files AS MF

    ON IVFS.database_id = MF.database_id

    AND IVFS.file_id = MF.file_id

    GROUP BY DB_NAME(IVFS.database_id), MF.type

    )

    SELECT db, file_type,

    CAST(1. * io / (1024 * 1024) AS DECIMAL(12, 2)) AS io_mb,

    CAST(io_stall / 1000. AS DECIMAL(12, 2)) AS io_stall_s,

    CAST(100. * io_stall / SUM(io_stall) OVER()

    AS DECIMAL(10, 2)) AS io_stall_pct,

    ROW_NUMBER() OVER(ORDER BY io_stall DESC) AS rn

    FROM DBIO

    ORDER BY io_stall DESC;

    I get this

    db file_type io_mb io_stall_s io_stall_pct rn

    Message_Journal data 163472.36 1184616.75 82.24 1

    It would appear that I have an IO issue (at least at the OS level).

    Anyone have any thoughts? It would be appreciated.

  • You could try the following as I had a similar issue last year.

    - Are your data/log/tempdb on seperate disk drives/arrays?

    - Have you tried rebuilding the indexes that are been used in the the queries? Also maybe worth looking at disk frag too.

    - Do you have an anti-virus installed that does not exclude the required sql files?

    - Make sure all firmware updates are done, especially if your using a SAN.

    Check out the link below.. Some good checks are described.

  • I read the link you sent along previously.

    Yes tempdb, databases and log files are on seperate drives (at the OS level). The drives are SAN raid 5 drives.

    I rebuild indexes regularly so thats not the issue. It just appears that we hit a wall with respect to IO. The SAN admin tells me that the physical drives don't appear to be the issue, but at the OS level is appears to be an IO issue. It may have something to do with the WM configuration but I'm not sure.

    Any other thoughts?

  • Are these VM machines?

  • That's about 7 messages a second (assuming 24 hour day), which isn't so bad unless there are big peaks.

    Sorry to say it, but this is a typical response from a SAN administrator. The fact that you have high queue lengths says there is a problem.

    Being on RAID 5 is a bad idea, I'd move the DB onto RAID 1+0. Have the log and TempDB at least on striped disk and NOT RAID 5. This should give better performance all round. Also get the DISK admin to sector align the disks. Nonaligned disks result in 2 sector reads to return 1 page, effectively doubling your disk I/Os.

    Another question is how is the server connected to the SAN and how is this configured? Is this a single HBA? Since it's a VM you could well have an issue there. I'd seriously look at moving the DB off the VM and see if things improve, esepecially if the dedicated server has dual HBAs, load balanced.

    Leo

    Yes!

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • I too have been getting these issue (along with browser service failing to restart with resource limitations and error 333 in windows system event logs) and have found an article which indicates it’s a VM memory allocation issue ( which is what I thought but I need to prove it to our VMWare Admins) .

    http://blogs.msdn.com/b/repltalk/archive/2010/12/28/troubleshooting-sql-server-error-the-operating-system-returned-error-1453.aspx

    The article states that VM memory allocation should be set to unlimited. This makes sense to me as I have configured the system to use a large portion of what the system is meant to have and I have locked pages in memory. Still got to get it tested but would be interested if anyone else can test and finds this is the issue (more evidence to put before the VMWare Admins:-))

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply