buffer latch type 4 error - need help

  • We had a hard error that occurred during our database maintenance job, which reindexes all the tables. It seems to have invoked a recurring buffer latch type 4 error, that did not clean up until the sql server service was restarted.

    We are win2k3, enterprise edition 2000 sql server sp3 plus security patch, 8proc 8GB, NAS disk device.

    Any experience with these types of errors or advice is greatly appreciated

     

    2004-08-08 01:19:56.98 spid99    Stack Signature for the dump is 0xDD4A7E30

    2004-08-08 01:19:57.00 spid99    SQL Server Assertion: File: <buffer.c>, line=3723

    Failed Assertion = '!(bp->bdbid == dbid && ALL_ON (BUF_HASHED | BUF_CHECKWRITE | BUF_DIRTY, bufstat) && IS_OFF (BUF_IO, bufstat) && bp->bpage->GetXdesId () == xdesId)'.

    2004-08-08 01:19:57.00 spid99    Error: 3624, Severity: 20, State: 1.

    2004-08-08 01:25:06.13 spid6     Time out occurred while waiting for buffer latch type 4,bp 0x3de1800, page 1:5730344), stat 0x10000b, object ID 9:213575799:2, EC 0x806F03C8 : 0, waittime 300. Not continuing to wait.

  • Since a buffer latch is  ressource lock while sql swap between memory & disk and since this involves a times out I would do this:

    a) check in the NT log if there is any error that are disk related. A problem related to hardware.

    b) if nothing is found, call MS.

    Note: I'm tempted to blame the NAS. NAS is too slow for SQL and an IO timeout doesn't surprise me. You can try monitoring disk queues. I'm sure they max out at 100% during maintenance. In short, I think this is hardware related...You have a NASty problem

    Hope it helps

    Eric

  • If you have SQL Dump created, Send it to Microsoft.

  • I have seen many threads on this site stating performance issues with NAS. But if it is only during re-indexing you are having problems, you might want to break down your re-indexing job to re-index only some tables at a time instead of the whole database.

  • I had similar errors, but we were using SAN not NAS though. When contacted MS they provided a hot fix Q838647, which  was for some SANs connected to Cisco switches. This fix by the way was not yet published on the MS site and you had to use password to access the fix.  This leads me to believe that there may be several other fixes out there, that MS may not publish, since they are very specific. BTW, we use IBM shark storage. I also agree with other posted regarding NAs not keeping up with the IO requests. Please keep us posted on how you resolve the issues.

  • Seems we got the same response from Microsoft as you.  We opened an issue with Microsoft, and their resolution was the hotfix in this KB (below); for us, this will be Q834628 because we run on Windows 2003 Server Enterprise Edition.

    I'm surprised by the article, which basically says there is a memory managment problem happening somewhere between the OS & SQL Server or something, on a multiprocessor multiGB memory machine. I am surprised because

    1) we are using the best software Microsoft provides: SQL 2000 Enterprise Edition, Windows 2003 Enterprise Edition, all software up to most recent service pack

    2) I was under the impression that Microsoft's strategy with Yukon/SQL 2005 is to do exactly as this KB problem is stating: serve a single db on a large multiprocessor, multiGB memory machine, as opposed to many smaller distributed machines ("server farm").

    Anyone have any opinions on this? I'm wondering that if we have trouble with the software we have now, will we have similar trouble with Yukon/SQL 2005?

    Here is the KB:

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;838765

    By the way, we've had pretty good results with our NAS. In our testing, we found it to be faster, more reliable, and much easier for hot-restore than local disk or raid.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply