Merge Replication Performance Issue

  • Hi All,

    We are having a performance problem with SQL Server Merge Replication on a 800MB database where by we have a huge variation in the length of time it takes to replicate, from sub-minute up to 1 hour. To put this into context the initial snap-shot takes ~10 minutes to apply.

    The central database synchronizes with five filtered state databases over a fast connection. Data is filtered for a state by using the HOST_NAME() set to a State ID. Related data is then filtered using Join filters. These synchronize continuously. The state servers then re-publish the data without filters to the remote client PCs running SS2005 Express, connecting over a slow connection using web synchronization. We have approx. 60 test subscriptions. We are using Row-level tracking.

    I'm currently pushing test transactions into client machines and synchronizing to get a good level of meta-data and I'm sync'ing throughout the day. Typically at the start of the day there will be a very slow synchronization at ~25 mins. I then push in a days worth of transactions and get times ~3 mins. Then I push in a few more days worth of transactions for all states and then sync these machines at the end of each 'day'. Next actual morning I will send up a new 'day' and following this it will take ~25 mins again, and I can't figure out what is causing this to be so slow. It doesn't seem to follow the amount of data that I've pushed down.

    I've also tried to run the various shared replication jobs but these don't seem to affect the time of following sync's.

    We've run a trace on one of the slow synchronizations (4000 inserts which took ~1 hour) and it seems to be doing lots of meta-data checks: 10,000s of calls to sp_MSenumcolumns and 1,000s of calls to sp_MSinsertgenhistory and sp_Msupdategenhistory. The last two I can understand, but the first is far in excess of the number of updates.

    Does anyone have any ideas what could cause these problems and have any suggestions as to how we can improve the general performance?

    Many Thanks, Ed

  • Update...

    Ran the sync without using Web Synchronization and the messages are more helpful, from:

    Enumerating deletes in all articles (generation batch 1)

    Enumerating inserts and updates in article '[an article name]' (generation batch 1)

    ....

    Enumerating deletes in all articles (generation batch 25831)

    Enumerating inserts and updates in article '[an article name]' (generation batch 25832)

    and counting. This is for the first hour of the sync for no data changes!

  • Update 2...

    Have simplified the replication schema by doing away with the republishing subscribers and adding a static publication for each state. However, as soon as I started adding data to my test subscribers the replication times started to blow-out again.

    It looks like the problem is due to a massive increase in the generation count on the publisher and all the subscribers struggling to keep up using the slow Web Sync. On a subscriber not using Web Sync, the times are dramatically reduced and once a full sync has taken place (with no new data being added) the time can be sub-minute again.

    I have also removed all non-replication triggers from the database, even though they were all marked NOT FOR REPLICATION.

    I am now seeing if I can bring the replication times down again by repeatedly synchronizing the subscribers without adding any new data, some using Web Sync, some not.

  • Possible Solution.

    Looks like the problem has been fixed by switching the websync subscriptions from the Slow Link Agent Profile to Default Agent Profile.

    After testing with MS Profession Support (thanks Ling and Linda), the issue appears to be due to a combination of things, mainly looking like this is due to a large number of articles and subscribers. When all subscribers are syncing this creates a large number of Generations many of which are empty, i.e. have no data changes.

    Websync by default uses the Slow Link Agent Profile which processes a single generation per batch. This is recommended agent profile for websync in most of the documentation I read. Regular syncing uses the Default Agent Profile, which processes 50 batches in one go. Processing either 1 or 50 empty generations takes approximately the same time, but the round-trip through the webservice adds an overhead of a few seconds to each. Therefore Slow Link Agent Profile in this instance actually takes 50 times longer to process nothing.

    Since changing profile most of the subscriptions have been behaving well, although I have a couple which are timing out now. This seems to be due to a merge history purge, and may be due to some changes I made during testing. I have manually cleared the history and will report back if I find any more issues.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply