Linked Server Error- worked for months, now doesnt.

  • I have an Agent job which runs nightly- inserting records into another server in a different domain. The job runs using a linked server with sql authentication login specified. Both servers are SQL 2005 with SP3 applied.

    For many months, the job ran without out an error.

    About a week ago, the job began failing every time it runs. I have also tried running the stored procedure the job calls directly from SSMS, and get the same result. The remote server has the "remote query timeout" setting configured for 1200 seconds (20 minutes). The job will usually run 3-5 minutes before the error is generated, but sometimes it happens in just a few seconds and sometime the job will run for as long as 9 minutes- but every time it fails.

    I can run simple queries across the linked server- (like select * from [remoteserver.domainname].master.dbo.sysdatabases without an problem)

    The job history shows the following:

    Executed as user: DOMAIN\USER. TCP Provider: An existing connection was forcibly closed by the remote host. [SQLSTATE 42000] (Error 10054) OLE DB provider "SQLNCLI" for linked server "remoteserver.domainname" returned message "Communication link failure". [SQLSTATE 01000] (Error 7412). The step failed.

    The Error log shows the following:

    The OLE DB provider "SQLNCLI" for linked server "remoteserver.domainname" reported an error 0x8000FFFF aborting the current transaction.

    There were some OS patches recently applied, to the primary server, but nothing specific to SQL Server.

    Any help would be greatly appreciated as I am resisting the temptation to bang my head against my desk! Let me know if any additional information would be helpful.

  • Any errors in the event log (SQL and Windows events) on the receiving server? Wonder if you have something running there that bogs down the server, or drops connections or something?

    ----------------------------------------------------------------------------
    Sacramento SQL Server users group - http://sac.sqlpass.org
    Follow me on Twitter - @SQLDCH
    ----------------------------------------------------------------------------

    Yeah, well...The Dude abides.
  • unfortunately, I'm stuck looking at it from just a SQL perspective on the remote host- I don't currently have access to the OS...yet...

    one thing worth mentioning though: The error I get in SSMS is slightly different from those above:

    OLE DB provider "SQLNCLI" for linked server "remoteserver.domain" returned message "Communication link failure".

    Msg 10054, Level 16, State 1, Line 0

    TCP Provider: An existing connection was forcibly closed by the remote host.

  • Error seems fairly poignant...you mentioned it's across a different domain, I'm wondering if you're running into the AD replication cycle or something.

    Hard to tell what the problem is just from the SQL Server side, unfortunately.

    ----------------------------------------------------------------------------
    Sacramento SQL Server users group - http://sac.sqlpass.org
    Follow me on Twitter - @SQLDCH
    ----------------------------------------------------------------------------

    Yeah, well...The Dude abides.
  • I'm not using windows authentication at all- purely SQL authentication. Also- ran at various time of day over the past week or so.

  • Has there been and patches put on the server recently, this has happened to me before due to windows patches.

    Facts are stubborn things, but statistics are more pliable - Mark Twain
    Carolyn
    SQLServerSpecialists[/url]

  • NJ-DBA (4/13/2010)


    ...

    Executed as user: DOMAIN\USER. TCP Provider: An existing connection was forcibly closed by the remote host. [SQLSTATE 42000] (Error 10054) OLE DB provider "SQLNCLI" for linked server "remoteserver.domainname" returned message "Communication link failure". [SQLSTATE 01000] (Error 7412). The step failed.

    The Error log shows the following:

    The OLE DB provider "SQLNCLI" for linked server "remoteserver.domainname" reported an error 0x8000FFFF aborting the current transaction.

    I am a bit foggy on it, but I think that this error means that something went wrong with the account logon on the target server. Check that the Logon is not expired or disabled on the remote server, that the local/remote passwords match and that the default database for the remote logon still exists and is accessible to that logon.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • All of those check out with the exception of "same password". The account that I am using to access to remote server does not actually exist on the local server- my linked server is configured with the "for login not defined in the above list, connections will: ... Be made using this security context (username and password is specified correctly there).

    Also, I can see that it actually connects and starts doing work on the remote server. Here's some weird/interesting info:

    -In the local server's activity monitor, I can see it waiting on OLEDB and the spid on the remote server is listed.

    -On the remote server's activity monitor, that spid is no listed- even with all filters turned off! But if I do DBCC INPUTBUFFER (Spid) where the spid is the one I recorded from the local server, I can see it doing the insert that it's supposed to do....

    So that seems to me to rule out the account login problem or something weird like that. To me this feels like OLEDB driver problem or something. Especially since I can run short queries w/o an error.

  • @Carolyn- YES- there were some patches to the OS recently.... do you remember how your resolved or if a specific patch was the culprit?

  • It was a long time ago but I think that the MSDTC settings needed to be set to allow distributed transactions, as the patch we applied stopped distributed transactions. It was a couple of years ago so would be surprised if this is the same issue unless you are really behind in your patches. It was when Microsoft changed their policy and closed down security features by default rather than leaving features open by default.

    We needed to go to Component Services/Computer/My Computer then right click on properties then MSDTC and Security Configuration and check Network DTC access, Allow remote clients, allow Inbound and Allow Outbound.

    May be worth checking out anyway.

    Facts are stubborn things, but statistics are more pliable - Mark Twain
    Carolyn
    SQLServerSpecialists[/url]

  • DTC settings check out ok... again- it starts the process and cranks along for several minutes but eventually returns error 7393 OLE DB provider 'SQLNCLI' reported an error aborting the current transaction.

  • Can you run the code directly on the remote server? I seem to remember getting cryptic errors from a linked server caused by syntax errors. Sometimes the data would change, like an invalid date would be entered, then the code would fail, but the linked server reported something seemingly unrelated.

  • Ah- good point... would have to created a linked server from the remote back to the source to test that... fortunately though, I recently learned the process is no longer required.... So I just disabled the job and an chalking it up to "a problem on their side".

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply