*&^%$#@!

  • CheckQueryProcessorAlive: sqlexecdirect failed

    That has been going on sine the mid 90's. No one seems to have an answer as to why. The cluster kept stopping the DBServer and restarting it. Every 60 minutes like clock work. The CPU and plenty of idle CPU and IO left too. 5% util is about as hgh as the 8 cpu box got and the disk queue was always less than 2 and mostly less than 0.

    Spent alot of money to do this and after 3 attampts to cluster, I have come to the conclusion thaty it is a marketing gimmick. Real Mickey Mouse stuff unless you work with HP on an OEM DataCenter setup. Always some issue I had. This was the worst. Reminds me of Unix stuff from back in the day. Look at it crosseyed an it goes belly up. Enterprise ready.... I think not. Never has been.

    I am the cluster and I can't see you, so I am going to stop you. But if it can't see me how can it stop me, and if it can stop me, it can see me, no?

    Loosing faith in MS DBMS's quick. I see a switch to Oracle in my future. 

  • Sounds like your cluster config.. not your rdms..

  • well I do not htink it is the DBMS. it is the MSCS. It is nio the config. I do not know how much experience you have with it but it is pretty easy to set up. The private nic has KB's all to itself and there are do's and don'ts on the MS web sites. All in all it is pretty nice to set up. Exchange is a PITA. SQL Server is a breeze though. I have had one problem or another. Each of the 3 times it has been a different problem.

     

    Using the Cluster for file shares is quite reliable and easy. the probelm is the virtual server and the MSCS. There is something that goes wrong that nobody seems to know. Microsoft told me to uninstall it and reinstall it. The config was fine. This is a known issue and they are unclear why it shoes up. To be fair it is likley a small percentage. But that with the other 2 issues, I scrapped the SQL Cluster all together.

    I still cluster file storage though.

     

  • I agree, this sounds like a cluster problem not SQL Server.  You said: "Spent alot of money to do this and after 3 attampts to cluster"; does this mean that you've attempted to rebuild your cluster?  What does your SQL Server log say after the failover?  What version OS and SQL Server are you running?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Sql server log says nothing. There are no errors in SQL Server. There is no fail over either since there is no actual resource failover. The windows event log shows a series of errors every hour and then it gets issued a gracfull shutdown by the cluster. The it come back up on the primary node again nice and clean.

    No errors in the custer logs either. There is a specific KB for this very issue going back to SQl Server 6.5. Still happening today. Google the error yourself and see.

    Basic problem, SQL Server for an unknown reason fails to respond the the IsAlive probe. Therefore the cluster shous it down and restarts it. there is NO cluster config item aside from interval to adress this. Increasing the interval just delays the problem by that added interval. the cluister performs corectly. I can fail it back and forth all day. It works great. Except for the fact that the cluster can not see it every now and then and shots it down.

    If anyone has any thing concrete I'm listing. "Your config is not right" is simple not the issue and really does not help. If you have a specific thing like "disable netbios on the private nic" or "disable dns registration on the cluster nic" or make sure the adapter bindings are in the right order as well as the cluster bindings" then pass it on.

    Yes I have tried this 3 seperate times with the same machines, reimaging each time. Each time there was some little issue. This one went on for 2 months before it decided to start cycling the service. I have clustered other boxes no problem. This seems to be a SQL Server MSCS issue.

    DL585's woth 2 MSA500's and SQl Server 2003 SP1 and SQl Server 2000 enterprise SP4. Like I said, the problem EXACTLY this problem has been around since sql 6.5

    Microsoft reviewed the install and said reinstall again,. If you have some inside info that we missed I'd appreciate hearing it.

  • I have seen this problem before and here are 3 thing to try.

    1) Using this script to see if SQL is clustering

    --

    xp_cmdshell 'Ping -a 127.0.0.1' --to see which node is active.

    go

    select * from ::fn_servershareddrives()

    go

    select * from ::fn_virtualservernodes()

    go

    select * from sysservers

    go

    select @@servername

    go

    select @@version

    go

    2)Download the MS cluster diagnostice utility free at MS. Just Google it I guess. It should help you locate the problem

    3)Finally make sure that you "heart beat" cable is good. Our's was bad as one end was not made right. If you don't have it connected directly to each of the nodes but are using a switch, make sure that the switch is configured correctly in that 1) You are getting connection 2)Each nic and the switch ports are set to 100 FULL *** do not use the "AUTO" settings (this was our problem too)and finally eliminate a possible switch/port issue(s) connect the heart beat cable directly from one node to the other (bypassing the switch).

    I last note is to make sure you have all the latest service packs and patches for OS and SQL.

    Good Luck... I feel your pain..been there, done that and got the T-shirt too

    Rudy

  • Agree, sounds like your heartbeat interconnect is flakey.

    Brian:

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply