Critical Error - SQL 2005, Server TCP Provider has stopped listening on port [1433]

Question

Post reply

Critical Error - SQL 2005, Server TCP Provider has stopped listening on port [1433]

scott williams-465021

SSC Veteran

Points: 213
More actions
May 6, 2010 at 5:38 am

#224210

We're running SQL 2005. Looking at the performance information we appear to have an IO issue. We've been seeing quite a few messages in the event log like this;

SQL Server has encountered XXX occurrence(s) of I/O requests taking longer that 15 seconds to complete on file ...

Recently we created a database that is used to log messages from another system (600,000 messages a day) and I suspect that the write activity of this database is causing an issue.

Our SAN admin tells me that although the drives are busy they are within acceptable ranges. I do see high disk queue lengths though. Also we are running on VM.

Running this query to see IO stall; WITH DBIO AS

(

SELECT

DB_NAME(IVFS.database_id) AS db,

CASE WHEN MF.type = 1 THEN 'log' ELSE 'data' END AS file_type,

SUM(IVFS.num_of_bytes_read + IVFS.num_of_bytes_written) AS io,

SUM(IVFS.io_stall) AS io_stall

FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS IVFS

JOIN sys.master_files AS MF

ON IVFS.database_id = MF.database_id

AND IVFS.file_id = MF.file_id

GROUP BY DB_NAME(IVFS.database_id), MF.type

)

SELECT db, file_type,

CAST(1. * io / (1024 * 1024) AS DECIMAL(12, 2)) AS io_mb,

CAST(io_stall / 1000. AS DECIMAL(12, 2)) AS io_stall_s,

CAST(100. * io_stall / SUM(io_stall) OVER()

AS DECIMAL(10, 2)) AS io_stall_pct,

ROW_NUMBER() OVER(ORDER BY io_stall DESC) AS rn

FROM DBIO

ORDER BY io_stall DESC;

I get this

db file_type io_mb io_stall_s io_stall_pct rn

Message_Journal data 163472.36 1184616.75 82.24 1

It would appear that I have an IO issue (at least at the OS level).

Anyone have any thoughts? It would be appreciated.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

sql_lock SSCrazy Eights Points: 9373 More actions · Answer 1

You could try the following as I had a similar issue last year.

- Are your data/log/tempdb on seperate disk drives/arrays?

- Have you tried rebuilding the indexes that are been used in the the queries? Also maybe worth looking at disk frag too.

- Do you have an anti-virus installed that does not exclude the required sql files?

- Make sure all firmware updates are done, especially if your using a SAN.

Check out the link below.. Some good checks are described.

scott williams-465021 SSC Veteran Points: 213 More actions · Answer 2

I read the link you sent along previously.

Yes tempdb, databases and log files are on seperate drives (at the OS level). The drives are SAN raid 5 drives.

I rebuild indexes regularly so thats not the issue. It just appears that we hit a wall with respect to IO. The SAN admin tells me that the physical drives don't appear to be the issue, but at the OS level is appears to be an IO issue. It may have something to do with the WM configuration but I'm not sure.

Any other thoughts?

sql_lock SSCrazy Eights Points: 9373 More actions · Answer 3

sql_lock

SSCrazy Eights

Points: 9373

May 6, 2010 at 9:13 am

#1163332

Are these VM machines?

Leo.Miller SSChampion Points: 13035 More actions · Answer 4

That's about 7 messages a second (assuming 24 hour day), which isn't so bad unless there are big peaks.

Sorry to say it, but this is a typical response from a SAN administrator. The fact that you have high queue lengths says there is a problem.

Being on RAID 5 is a bad idea, I'd move the DB onto RAID 1+0. Have the log and TempDB at least on striped disk and NOT RAID 5. This should give better performance all round. Also get the DISK admin to sector align the disks. Nonaligned disks result in 2 sector reads to return 1 page, effectively doubling your disk I/Os.

Another question is how is the server connected to the SAN and how is this configured? Is this a single HBA? Since it's a VM you could well have an issue there. I'd seriously look at moving the DB off the VM and see if things improve, esepecially if the dedicated server has dual HBAs, load balanced.

Leo

Yes!

Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.

PAH-440118 Ten Centuries Points: 1232 More actions · Answer 5

I too have been getting these issue (along with browser service failing to restart with resource limitations and error 333 in windows system event logs) and have found an article which indicates it’s a VM memory allocation issue ( which is what I thought but I need to prove it to our VMWare Admins) .

http://blogs.msdn.com/b/repltalk/archive/2010/12/28/troubleshooting-sql-server-error-the-operating-system-returned-error-1453.aspx

The article states that VM memory allocation should be set to unlimited. This makes sense to me as I have configured the system to use a large portion of what the system is meant to have and I have locked pages in memory. Still got to get it tested but would be interested if anyone else can test and finds this is the issue (more evidence to put before the VMWare Admins:-))