Uncommitted Commands not reporting correctly (I think)

  • I created a one-way transactional replication from a SQL2005 db to SQL2008 - everything went fine for about a week. Then I encountered a failure after a week, reporting along the lines that an insert did not match the number of columns. Not being able to find the issue nor what transaction was causing the issue, I reinitiated a new snapshot and this then went fine. It happened again a week later - repeated the process. Later on we encountered network issues and the replication got behind by several million transactions. In trying to isolate the problem I removed the whole replication - Subscription and Distribution - then re-created it.

    Now that it is recreated - I ran a rowcount on all tables, database compares (schema, indexes, etc), all matches up, queries all show same data in both databases. I insert a Tracer Token and it reports excellent performance (1 - 4 second turn-around). HOWEVER, the "Undistributed Commands" is report a large number of Commands and days to process it.

    What does this mean?

    Is there a screwed up distribution tracking problem?

    Is my replicated data trustworthy?

    How do I get it all back in sync?

    I got the following script from another forum to provide a count of unprocessed trans - but it gives me a different count as well:

    select

    cast(count(*) as varchar(20)) + ' transactions are pending replication. The oldest transaction pending replication is ' + cast(min(entry_time) as varchar(20)) + ' minutes old',

    count(*) as NumOfTrans,

    min(entry_time) as minEntry,

    max(entry_time) as maxEntry,

    datediff(minute, min(entry_time), max(entry_time)) as minuteDiffFirstToLast,

    datediff(minute, min(entry_time), getdate()) as minuteDiffFirstToNow

    from

    distribution.dbo.msrepl_transactions

    [font="Arial"]-----------------------------------------------------------------
    Rich N - Development Manager @ Delta Technology Solutions Ltd - M.O.T.H.E.R. your virtual Sys Admin - Remote Server Health Monitoring - 24 x 7 www.deltatsl.com
    [/font]

  • I found a similar post in another forum which gave me some guidance to getting this "potentially" sorted.

    http://social.msdn.microsoft.com/Forums/en-US/sqlreplication/thread/9ea9343b-6dac-4fcb-bde2-4cdcf4862daa

    1. I stopped SQL Server Agent on my Distribution server in order to effectively stop all replication jobs

    (although I probably could have manually stopped each of the individual replication jobs instead).

    2. I re-ran EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 1 in order to clean-up

    the distribution history older than 1 hour. This outputted that no transactions were removed.

    3. I re-ran sp_browsereplcmds and saw that I still had a number of transactions in MSrepl_commands.

    4. I restarted SQL Server Agent which restarted all the replication jobs.

    5. Magically, the undistributed commands went to zero!

    6. A quick re-run of sp_browsereplcmds showed that I still had the same number of transactions in MSrepl_commands.

    I confirmed this by SELECTing directly from distribution.dbo.MSrepl_transactions and distribution.dbo.MSrepl_commands.

    RN - I stopped the agents on both machines, just to be sure, and followed the of the steps - now my Undistribs appear to line-up.

    Here's hoping it holds!

    [font="Arial"]-----------------------------------------------------------------
    Rich N - Development Manager @ Delta Technology Solutions Ltd - M.O.T.H.E.R. your virtual Sys Admin - Remote Server Health Monitoring - 24 x 7 www.deltatsl.com
    [/font]

  • Mismatch columns between publisher and subscriber can many reasons one of then could be schema changes at the subscriber. In Replication there is an option to allow schema changes at subscriber set this option to "no" because when ever you try to make schema changes at publisher side those objects involved replication will not allow to make schema changes. The catch here is remove the object from publisher make schema changes and after schema changes add object back to replication and run snap shot this should fix column mismatch issue.

    You said publisher is SQL2005 and subscriber is SQL2008, what is your distributor version? Best practices is Distributor version should be higher than Subscriber version. And also make sure that you all replications jobs are not running at same time.

    EnjoY!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply