SQLAgent doesn''t run when server rebooted (error 241)

  • Hi All,

    I usually lurk here but this time I have a question I've not seen answered anywhere on the web.  You guys are pretty smart, right??

    The customer is running SQL 2000 Standard on multiple sites, with replication.  Operating system is Windows 2003 SP1. 

    What brought our attention to this problem is that replication wasn't occurring consistently on one of the sites.  We finally determined the reason was because the SQLAgent service had failed to start on the last reboot (11-23-06), which of course killed replication.  When the SQLAgent service is started manually, everything works perfectly.  Both SQLAgent and MSSQL are running with Local System credentials.

    Conventional wisdom suggested that the SQLAGENT service must be misconfigured, perhaps to not restart on failure.  I was tasked with checking this out.  The customer was unhappy.  I was expected to produce results.  Quickly!

    Following are the notes I produced:

    ---------------------------------------------------------

    The SQLAgent service is already configured to restart on failure, in one-minute intervals.  There is no evidence (in the event log) that this service is in fact failing.

     

    The reboot on 11/23/06 was unexpected.  It was not rebooted by a user.  There is little in the logs to indicate why this occurred.  It may have been a power outage.  There have only been four unexpected reboots this year, and the last one was in August.

     

    The SQLAgent service has no dependencies and depends on no other service.

     

    I checked the SQLAgent Logs and found some very interesting information.  Over the last nine restarts of the SQLAgent Service, an error occasionally occurs:

     

    2006-11-23 17:49:14 - ! [241] Startup error: Unable to initialize error reporting system (reason: The EventLog service has not been started)

     

    2006-11-23 17:50:34 - ? [098] SQLServerAgent terminated (normally)

     

    The SQLAgent logs are generated every time the service is restarted.  Going back nine logs, it appears that this has only happened three times since August 2006:  11/23/06, 10/27/06, 8/18/06.  When the SQLAgent service is started manually, it runs without any problems.  But the interesting part is, this error has only presented itself three times out of the last nine reboots (one third).   Why wouldn’t it happen every time the server restarted?

     

    This error is not an uncommon one.  I have googled the error and there is a lot of discussion, but very little in the way of substance.  Many write that the error makes no sense, because they verify that the Event Service starts long before the SQLAgent service.  Some discussions concern themselves with what Domain credentials the  that the Agent service is running under; many discussions peter out with no answers. 

     

    One suggestion was to disable the feature to send errors automatically to Microsoft.  However, the instructions on how to do this were inaccurate, which probably means the error reporting feature was not opted when the server was patched to SP4.

     

    I’ve done a lot of comparisons based on timeline as well as determining events in the Windows event logs.  I have not seen any reason for this to be occurring.  I noted in the ERRORLOG that on the occasions the SQLAgent service does start, then an Extended Procedure ('xp_sqlagent_monitor') also executes.  At first, I was not certain whether this XP actually started the SQLAgent service, or vice versa.  Based on the timeline, I think the SQLAgent needs to be running first.

     

    At this point, I believe there are a couple of options to be considered.

     

    1.       Engage with Microsoft Product Support for resolution.

    2.       Reinstall the latest SQL Server Service Pack (SP4)

    3.       Reinstall SQL Server and restore line of business application databases.

     

    Of these options, I believe that contacting Microsoft would be the most prudent.  There may be a quick fix that only Microsoft is privy to.  

     

    ---------------------------------------------------------

     

    ...anyway, I was wondering if any of you had seen this problem, and if it's worth the customer's money to call Microsoft.  Note that the only unusual conditions I've seen on this server are:

     

    1. There are MSDTC errors when the server reboots, complaining that a recent DCPROMO event was not properly processed. 

     

    2. When I attempted to run SQLDiag to produce a report, I got a buffer overrun error.  I know there is a fix for this available from Microsoft (KB902955) but I am pretty sure it's not related to the SQLAgent issue.

     

    Other than that, the server seems pretty sterile.

  • (Both SQLAgent and MSSQL are running with Local System credentials.)

    The above is the reason for your problems, Replication will not run when the Agent does not have access to Network resources, the Agent needs Admin level permissions if you don't want to be going to the client's site to fix this problem every week.  I have known this since 1999 and Microsoft put some info as of service pack 3a that you need Admin permissions to run replication.

    (I am pretty sure it's not related to the SQLAgent issue.)

    You may have other problems but the account is the main thing because Replication uses MSDB to perform most task.  So fix the account now and then you could get to the other problems, if any.  Hope this helps.

     

     

    Kind regards,
    Gift Peddie

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply