Failover Cluster Fails to Failover

Question

Post reply

Failover Cluster Fails to Failover

Ed Mlynar

Mr or Mrs. 500

Points: 593
More actions
February 8, 2010 at 12:34 pm

#138821

We have SQL on an active/passive cluster with the following details:

SQL Server 2005 Enterprise 64-bit

Windows 2003 R2 64 bit

MSA1500 CS disk array

The cluster works fine until it is suppose to fail over. The first indication in the system log that there is a problem is the “Cluster resource ‘SQL Server’ in Resource Group ‘SQL Server Group’ failed” error. There is nothing in the application log or system log prior to this error that indicates there is a problem.

When it occurs, the server does not automatically fail over. Instead, the active node goes “missing”. We can ping it, but cannot access it via RDP, remote shutdown, file explorer – anything. SQL is not broadcasting on the server. The application log shows “ODBC sqldriverconnect failed” and “Unable to complete login process due to delay in prelogin response” errors every 30 seconds.

The only resolution is to physically shut the node off, which then causes the cluster to fail over.

I have compared the cluster settings to other clusters we have. The only difference is was the defined cluster groups did not have a preferred owner – which shouldn’t matter since we do not fail-back.

Has anyone seen something like this before?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

SQLBOT SSCrazy Eights Points: 8014 More actions · Answer 1

SQLBOT

SSCrazy Eights

Points: 8014

February 8, 2010 at 1:03 pm

#1115763

what does the SQL error log say?

Craig Outcalt

Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258

Ed Mlynar Mr or Mrs. 500 Points: 593 More actions · Answer 2

Ed Mlynar

Mr or Mrs. 500

Points: 593

February 8, 2010 at 1:09 pm

#1115768

Nothing. That last entry in the SQL error log was made 50 minutes prior to the failure and was mundane backup type entries. Nothing was written to the SQL error logs again until the failover was forced.

SQLBOT SSCrazy Eights Points: 8014 More actions · Answer 3

SQLBOT

SSCrazy Eights

Points: 8014

February 8, 2010 at 1:13 pm

#1115769

So SQL is still running, I assume... but probably shutting down?

Craig Outcalt

Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258

Ed Mlynar Mr or Mrs. 500 Points: 593 More actions · Answer 4

Ed Mlynar

Mr or Mrs. 500

Points: 593

February 8, 2010 at 1:19 pm

#1115775

I do not know if SQL was still running or not. Nothing was able to connect to it and I can tell that the Agent was running because I put a monitoring job on it that sent a heartbeat to another server. This is the only way I knew it was down.

SQLBOT SSCrazy Eights Points: 8014 More actions · Answer 5

Agent can't run without SQL Server running so that answers the question.

And if SQL Server is running you can't fail over.

It needs to shutdown on one node and the resources float to the other node and it brings SQL up on the other node.

look through the error logs to see the shutdown command and all the info related to why it wouldn't shut down. you may have been able to log on and kill/rollback some active transactions to speed up the process.

I've had SQL take 40+ minutes to shutdown and failover on a really really distinctive case where I just let the thing finish. That's because SHUTDOWN WITH NOWAIT isn't specified by the cluster software that I was using.

http://msdn.microsoft.com/en-us/library/ms188767%28SQL.90%29.aspx

Anyway... all that might not be your case, but use the sql error log reader to check up on what was going on on the server.

Good luck.

Craig Outcalt

Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258