Transactional latency issues between Distributor and Subscriber

  • Latency between our Distributor and the Subscriber is usually is less than 10 seconds. However sometimes (once or twice a month) it gets behind several minutes, or even a couple of hours. We haven’t been able to put our finger on any specific user activity or processing that precipitates the latency problem. We never have latency issues between the Publisher and the Distributor – only between the Distributor and the Subscriber. When we have these latency issues, transactions and commands are still replicated - as viewed with Replication Monitor - but they just seem to slow down. Transactions will eventually catchup on their own, but there may be latency issues for a few hours. This is one-way, push, transactional replication. The Subscriber is used mainly for read-only public access and job/report processing. PerfMon doesn’t show any CPU or Memory problems.

    Our Environment:

    SQL Server 2005 SP3/CU1 64 bit Enterprise Edition

    Windows Server 2003 Enterprise x64 w/SP2

    Intel Xeon CPU 3.40GHz – 8 physical CPUs per node

    64GB Memory per node

    SAN disks - all RAID 10

    Active/Active Cluster

    Node1: Publisher, Distributor, Cluster Group, MSDTC

    Node2: Subscriber

    Publisher: Max Mem set to: 32768

    Distributor: Max Mem set to: 10240

    Subscriber: Max Mem set to: 20480

    (If everything is forced to run on one node, this should leave about 2GB for the OS)

    Distribution agent is set for Continuous transaction replication – Push subscription.

    The application is vendor supplied.

    Anyone have any ideas about what might be causing this, or what to look at to diagnose the problem?

    Thanks

  • Replication could slow down when the distribution log clean up job runs. This is just one issue. If the Disk IO is busy, it could slow down replication. This can happen if someone is copying a large Back up file from the distributor to somewhere else. Check if there are any Bulk updates happening on the publisher at the time of the slow down. If there are lots of commands to be propagated from the distribution, this can cause slow down as well.

    -Roy

  • When this latency occurs, have you verified that there is nothing blocking at the Subscriber?

  • The Distrubution cleanup job runs every 10 minutes. I believe that this is a default interval and its always been set to 10 minutes. It seems to run okay even when latency is slow.

    I know that there are no bulk copy processes of any kind running.

    There is no blocking anywhere.

    Thanks for your ideas - I'm stumped!

  • Do you see any high IO on the distributor or Subscriber?

    And here is a wild question, Do you have antivirus installed on the servers and is it set to run at a certain time doing a full scan?

    -Roy

  • We don't run antivirus software on any of the nodes.

    I don't administer the SAN, but I've asked those administrators to look at I/O. They aren't in agreement that it is high. I don't think that they have a good stick to measure against. However they are performing some SAN upgrades this weekend that are suppose to improve throughput.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply