Serivce Broker 31+ million Disconnected_Inbound conversations

  • Hi,

    Running this query on one of our servers returned some scary numbers.

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

    SELECT COUNT(*) ConvCount, S.Name, CEP.is_initiator, CEP.[State], CEP.[state_desc], CEP.far_service

    FROM sys.Conversation_EndPoints CEP

    INNER JOIN sys.services S

    ON CEP.service_id = S.service_id

    GROUP BY S.Name, CEP.is_initiator, CEP.[State], CEP.[state_desc], CEP.far_service

    Results:

    ConvCount Name is_initiator State state_desc far_service

    120,189 InitiatorService 1 CO CONVERSING TargetService

    120,189 TargetService 0 CO CONVERSING InitiatorService

    31,153,305 InitiatorService 1 DI DISCONNECTED_INBOUND TargetService

    1 InitiatorAndTargetService 1 CO CONVERSING InitiatorAndTargetService

    1 InitiatorAndTargetService 0 CO CONVERSING InitiatorAndTargetService

    29,385 InitiatorAndTargetService 1 DI DISCONNECTED_INBOUND InitiatorAndTargetService

    All services are in the same database. The activation procedure on the InitiatorService service has been altered to end conversations properly.

    InitiatorAndTargetService uses a dialog pool base on this MSDN Blog.

    To clean up I was going to end all the conversation like this:

    DECLARE @SQL Varchar(8000) = '.',

    @LoopCount SmallInt = 0

    BEGIN TRANSACTION

    WHILE LEN(@SQL) > 0 AND @LoopCount < 100

    BEGIN

    SET @SQL = ''

    SELECT TOP 115

    @SQL = @SQL + 'END CONVERSATION ''' + CAST([conversation_handle] AS Varchar(40)) +''' WITH CLEANUP' + CHAR(13)

    FROM sys.conversation_endpoints WITH (NOLOCK)

    WHERE far_service = 'RemoteTargetService'

    AND [state] = 'DI'

    EXEC (@SQL)

    SET @LoopCount = @LoopCount + 1

    END

    COMMIT TRANSACTION

    Code takes 15 seconds to run. This means to around 11 hours to clean it all up.

    So my only practical option is:

    ALTER DATABASE MyReallyImportantDatabase SET NEW_BROKER

    The queues are filled via a C# application and a SQL Agent Job.

    What can I do to mitigate data loss? Something like:

    1) Pause C# app and Jobs,

    2) Wait for existing conversations to end

    3) End any stubborn conversations

    4) Alter database

    5) Unpause C# app and Jobs

    I am not sure if the C# application can be paused and, if it can, what the effects will be. So alternative idea's are welcome.

    There are no routes on this server.

    Remus Rusanu[/url] answered a similar question here.

    Can anyone second his claim "If everything you have is local, in the same db, then you have basically nothing to worry about."?

    What kind of Event Notifications issues can I expect?

    Cheers

  • I used ALTER DATABASE <DBName> NEW_BROKER WITH ROLLBACK IMMEDIATE to clean up a couple of databases with 10's millions open conversations with state "DI" (Disconnected Inbound).

    The biggest one only took 116 ms and the next biggest 1203 ms.

    When I tried the same on another database with a few million open conversations, I had to cancel after a 60+ secs.

    Although I can't be positive of this, system SPID 25 "BRKR TASK" started a long blocking chain due to the rollback.

    Only by seeing changes in sys.sysProcesses.waitResource for SPID 25, could I be sure that the SPID was actually doing anything.

    (Service restart narrowly avoided.)

    Also these huge amounts of "Disconnected" conversations caused the TempDBs to bloat.

    Looking at the properties of TempDB, all looked fine.

    Size : 59356.31 MB

    Space Available : 59089.55 MB

    Although it looks like there is 267 MB in use, I could not shrink the file

    The space available is incorrect as it is still reserved by Internal Objects.

    As these results from Michael Valentine Jones's query shows.

    VersionStore% UserObject% InternalObject% FreeSpace% TotalPages SizeMBs

    0.003 0.009 96.522 3.466 60510592 59092.38

    After the blocking chain issue, I have decided to follow Jacob Sebastians's advice and clean up RBAR style.

    Remus Rusanu[/url] provides some great info on how to prevent conversation leaks.

    If you need to use NEW_BROKER, try to stop the queue's from being filled (Blocking a firewall port worked for us on one of the queues) then wait for them to empty

    or at least do this when the service is as quiet as possible and you have time to rollback if needed.

    If multiple databases need doing, take your time. Do them one by one and check if the coast is clear before moving on to the next database.

  • I ended up using the code below to end the conversations.

    Cicra 3.5 million conversations per 30 mins.

    You might have to rinse and repeat to deal with all the conversations.

    SET NOCOUNT ON

    IF OBJECT_ID('TempDB..#Convs') IS NOT NULL

    DROP TABLE #Convs

    CREATE TABLE #Convs

    (

    ConvID Int IDENTITY PRIMARY KEY,

    ConvHndl UniqueIdentifier,

    Processed Bit

    )

    INSERT #Convs

    SELECT [Conversation_Handle], 0

    FROM sys.conversation_endpoints CE WITH (NOLOCK)

    WHERE [State] IN ('DI', 'DO')

    DECLARE @SQL Varchar(8000) ='.',

    @LoopCount SmallInt = 0,

    @MaxLoops SmallInt = 10000,

    @LoopTimer DateTime,

    @StartRow Int

    WHILE EXISTS

    (

    SELECT TOP 1 1

    FROM #Convs

    WHERE Processed = 0

    )

    BEGIN

    SET @StartRow = (SELECT ISNULL(MAX(ConvID), 0) FROM #Convs WHERE Processed = 1)

    SET @LoopTimer = GETDATE()

    SET @LoopCount = 0

    BEGIN TRANSACTION

    WHILE LEN(@SQL) > 0 AND @LoopCount < @MaxLoops

    BEGIN

    SET @LoopCount = @LoopCount + 1

    SET @SQL = ''

    SELECT @SQL = @SQL + 'END CONVERSATION ''' + CAST(ConvHndl AS Varchar(40)) +''' WITH CLEANUP' + CHAR(13)

    FROM #Convs

    WHERE Processed = 0

    AND ConvID BETWEEN @StartRow AND @StartRow + 100 -- 100 rows at a time. 100 * @MaxLoops = commit size

    BEGIN TRY

    EXEC (@SQL)

    UPDATE #Convs

    SET Processed = 1

    WHERE ConvID BETWEEN @StartRow AND @StartRow + 100

    END TRY

    BEGIN CATCH

    PRINT ERROR_MESSAGE()

    END CATCH

    -- Doesn't matter if there are errors or not, continue anyway.

    SET @StartRow = @StartRow + 100

    END

    COMMIT TRANSACTION

    -- For our viewing pleasure and to get an idea of how long it will take to complete.

    PRINT 'Cicra ' + CAST((@LoopCount * 100) AS Varchar(7)) + ' conversations ended in ' + CAST(DATEDIFF(S, @LoopTimer, GETDATE()) AS Varchar(10)) + ' seconds.'

    END

  • Still dealing with the bloated TempDBs : http://qa.sqlservercentral.com/Forums/Topic1607443-1550-1.aspx?Update=1

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply