Node Fails Over nightly problem

  • I need some help. I'm running Win2000 w/SQL 2000 Active/Active cluster. I've started having problems where one node will fail over automatically every night. The times are different each night and the only thing that shows up in the log file is the SQL Server terminated due to stop request from the service control manager. I can't find any error to point me in any direction. Does anyone have any idea what could be causing this?

    Thanks,

    Jeff

    Jeff Matthews


    Jeff Matthews

  • I had this very same problem.

    My cause was that I was running a very long running sp that used >95% cpu for over 20 minutes.

    The sql server cluster resource uses the odbc api call SQLExecDirect() to poll the server and tries to execute "select @@servername" every few minutes. Look for references to SQLExecDirect() in the application log and for "select @@servername" in a profiler trace.

    The the resource monitor gets three timeouts it assumes that the server is down (even though it's not, it's just slow) and initiates a failover. This causes the live server to get the SCM shutdown message and exit.

    Unfortuntately it is not possible to "tune" this timeout parameter. I had an incident open with Microsoft support for over a month and never once did they suggest changing this timeout parameter (which is aparrently hard-coded) to solve the problem. We finally managed to optimize the tables so that the query did not commit sufficient resources to block out the SQLExecDirect() call, and this resolved the problem.

    Incidentally, I went through three support engineers and it was only the third one that was able to fix it. I wasted a lot of time proving to them that it still failed if I did things like limit the server to 1 of 2 cpus or to 512MB of 1GB RAM. (Neither of these could possibly solve the problem because it was the SERVER and not the RESOURCE MONITOR that was CPU starved, but the first two engineers didn't care about that.)

  • thanks this sounds promising. I'm going to look into this and hopefully I will be able to resolve it.

    Jeff Matthews


    Jeff Matthews

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply