Understanding replication error messages

  • Hey all,

    I realize this is a somewhat broad question, but I want to figure out how to dig deeper into replication errors by looking up the specific commands referenced by the hexadecimal ID numbers.

    Here's an example of a transactional replication error I'm getting:

    Command attempted:

    if @@trancount > 0 rollback tran

    (Transaction sequence number: 0x000008ED00000468000400000000, Command ID: 1)

    The error associated with the command is unimportant. What I'm wondering is--how do I see the actual T-SQL command that the distribution agent is trying to execute? I seem to recall this as being possible. Is there a system table that lists these? Also, any pointers on how to manually skip replication commands? The hexadecimal sequence numbers are kind of throwing me.

    I'd really appreciate any help anyone can give. Thanks.

  • cjohn5552 (8/16/2011)

    I seem to recall this as being possible. Is there a system table that lists these? Also, any pointers on how to manually skip replication commands? The hexadecimal sequence numbers are kind of throwing me.

    cjohn5552,

    Pulled this from some scripts I wrote to do the same type of troubleshooting "stuck" replication in the past. Last place I was we ran a lot of merge replication, which uses completely different tables and SPs, but the below is for the transactional replication we run at the current place:

    --================================================================

    --== First determine the last transaction sequence number.

    --== This MUST be run from the context of the subscription

    --== database. The piece we want is the 'transaction_timestamp'

    --================================================================

    SELECT * FROM dbo.MSreplication_subscriptions

    --=====================================================

    --== Replication Transactions to be Replicated

    --== This MUST be run from the context of the published database.

    --== Can take some time, depending upon the retention

    --== period for already replicated transactions

    --=====================================================

    SELECT art.name AS [Article Name]

    , cmds.article_id

    , cmds.publisher_database_id

    , cmds.xact_seqno

    , cmds.type

    , cmds.originator_id

    , cmds.command_id

    FROM distribution.dbo.MSrepl_commands cmds inner join

    dbo.sysextendedarticlesview art on art.artid = cmds.article_id

    WHERE cmds.xact_seqno > -- *** Change this to the transaction_timestamp value for the publication in question ***

    Pay close attention to the comment in the 2nd stmt. You need the transaction_timestamp value from the first stmt. BUT, you should be able to plug in your xact_seqno directly there....

    HtH,

    -Patrick

    Patrick Purviance, MCDBA
  • Sorry,

    After looking at your post again, the above may not help. You're wanting the actual T-SQL....

    Try this:

    EXEC distribution.dbo.sp_browsereplcmds '0x00113DA20000F6F60005', '0x00113DA20000F6F60005'

    Replacing the xact_seqno "from" and "thru" with your transaction sequence number.

    HtH,

    -Patrick

    Patrick Purviance, MCDBA
  • You can execute sp_browsereplcmds on the replication DB(info about this here -> http://msdn.microsoft.com/en-us/library/ms176109(v=SQL.100).aspx)

    Plug in the xact_seqno to this command and you'll get information about commands involved (the command column in the results will have this information).

    For e.g.

    USE [distribution_db]

    EXEC sp_browsereplcmds '0x000008ED00000468000400000000','0x000008ED00000468000400000000'

    EXEC sp_browsereplcmds '0x000008ED000004680004','0x000008ED000004680004'

    Note that at times you'll need to trim out the trailing zeroes from the xact_seqno (I don't know exactly why this has to be done - I needed to do this at times)

    Regarding skipping transactions in replication - http://blogs.msdn.com/b/chrissk/archive/2009/09/08/how-to-skip-a-transaction-in-sql-2005-2008-transactional-replication.aspx - it goes without saying that you need to take extra care when skipping transactions 😉

  • These scripts seem great--thanks guys!

    One thing is that they're returning a lot of results. I think this is because the command in question I specified was from a couple of days ago. It was erroring out with error 547, and I couldn't figure out what was causing it, so I eventually set the distribution agent to skip that error.

    What I'm wondering is if there is an easy way to filter out commands that were not applied successfully, errored out, skipped, etc. Anyone know?

    Thanks.

  • I've never been down that path for troubleshooting as I rely on a a shema compare and a homegrown batch process leveraging tablediff.exe (msdn.microsoft.com/en-us/library/ms162843.aspx). I suppose you could use the MSrepl_errors table as a cross-reference for each transaction that errored in some way....

    Patrick Purviance, MCDBA
  • A search on error 547 related to replication indicates that this is generally caused due to a constraint violation (for e.g. deleting records from the parent table before deleting them from the child table etc)

    Replication monitor would contain details about the errors encountered - and it is better to try to figure out the cause for the error rather than just setting the distribution agent to skip the errors (if you skip them you might end up with inconsistent data which would be harder to troubleshoot).

    I second Patrick's approach of leveraging tablediff.exe - this is a good way to figure out if all the data is in synch.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply