Odd SQL issue - locked up server

  • We had some fun on our main server last night. Started around 11pm with the following in the error log. (MyDB is the default db for all of the logins listed)

    2007-10-03 23:00:59.42 Backup Log was backed up. Database: MyDB, .....

    2007-10-03 23:01:38.09 Logon Error: 18456, Severity: 14, State: 27.

    2007-10-03 23:01:38.09 Logon Login failed for user 'MyUser.

    According to docs, State 27 on a login failure means that the server could not determine initial db (offline, suspect or not there)

    That went on for a minute, the rest of the log backups ran (it's a scheduled job every 15 min) then the following started appearing

    2007-10-03 23:02:53.48 Logon Error: 18456, Severity: 14, State: 16.

    2007-10-03 23:02:53.48 Logon Login failed for user 'AnotherUser'.

    State 16 is apparently "Login does not have rights on default db." The login in question most certainly does.

    This continued right through to 4 this morning when one of the admins noticed soemthing was not right. No log backups ran during that period, though they were supposed to.

    The admin who fixed it noted that while he could log into the server, he could not run any queries. Not even a check of running processes worked. They just ran, and ran, and ran, ....

    What he did to resolve it was to kill processes bindly, starting at 51 and workign up until the server started responding again. Since then its been running very smoothly

    There are no other messages in the error log. There's nothing suspicious in the windows event log. None of the dbs were suspect or offline.

    Server specs: 12 proc Itanium. 48GB memory. Running Server 2003 datacenter and sql 2005 enterpriise SP2

    Anyone want to take a stab at what might have been the cause?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I've seen this when SQL Server is overwhelmed by connections, but that was on hardware substantially less than what you've got.

    K. Brian Kelley
    @kbriankelley

  • I've also seen it in those circumstances, but it's fairly unlikely here. 11pm is the beginning of the overnight process run, and there should be no more than 30-50 connections at the time. Peak of the day can be 5 times that.

    The worrying thing for us right now is that since we don't have a cause, we've got no idea if it might happen again

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Did you have any performance counters monitoring the vitals?

    K. Brian Kelley
    @kbriankelley

  • Not at the time, no.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • How did the admin get in to kill spids if he could not run queries? If he used the DAC I believe there are only a few types of queries you can run.

  • It was worth a shot. Running a server side trace is likely to kill you... I have seen that issue when someone was doing a trace, too, come to think of it, so it may be that one of memory pools were exhausted for some reason. Only thing I can think of is to put the counters up and cross your fingers if there's nothing in the logs.

    K. Brian Kelley
    @kbriankelley

  • Anders Pedersen (10/4/2007)


    How did the admin get in to kill spids if he could not run queries? If he used the DAC I believe there are only a few types of queries you can run.

    He could connect to SQL normally (not DAC), but queries, even against the system views, just ran and never finished. Hence he was killing spids blind (without even knowing if they existed)

    Fortunatly for us, the kill command worked. Otherwise it would have been a server reset.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • K. Brian Kelley (10/4/2007)


    It was worth a shot. Running a server side trace is likely to kill you... I have seen that issue when someone was doing a trace, too, come to think of it, so it may be that one of memory pools were exhausted for some reason. Only thing I can think of is to put the counters up and cross your fingers if there's nothing in the logs.

    Only the default trace was running, and I remembered that too late to check what was in it. 🙁

    Any counters you'd specifically suggest?

    *crossing fingers*

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Any related to memory usage by SQL Server, # of connections, lock timeouts/deadlocks, and the standard battery at the OS level (memory, processor, disk, etc.).

    K. Brian Kelley
    @kbriankelley

  • you should also run sp_blocker script to keep an eye on the processes and "hopefully" identify a long blocking process.

    Good Luck


    * Noel

  • Good point. If it's a really bad query eating up memory and taking forever to execute, it's going to hold locks that'll show up. This will catch such a query without having to resort to a full-on server trace.

    K. Brian Kelley
    @kbriankelley

  • GilaMonster (10/4/2007)

    He could connect to SQL normally (not DAC), but queries, even against the system views, just ran and never finished. Hence he was killing spids blind (without even knowing if they existed)

    Fortunatly for us, the kill command worked. Otherwise it would have been a server reset.

    If he never saw the results - how do you know for a fact that the Kill actually worked? versus whatever nasty thing was running just finishing on its own while Admin is trying to find it....

    I HATE those scenarios....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (10/4/2007)


    GilaMonster (10/4/2007)

    He could connect to SQL normally (not DAC), but queries, even against the system views, just ran and never finished. Hence he was killing spids blind (without even knowing if they existed)

    Fortunatly for us, the kill command worked. Otherwise it would have been a server reset.

    If he never saw the results - how do you know for a fact that the Kill actually worked? versus whatever nasty thing was running just finishing on its own while Admin is trying to find it....

    I HATE those scenarios....

    Kill logs its activity on SQL Server Error Log and if after a "killer" loop :hehe: you see that responsiveness is back to normal you know it worked 😉


    * Noel

  • Are you running DTC?

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply