Sliding Window Replication

  • I have SQL Server 2005 SP3CU7.

    I am using transactional push replication to replicate tables. The publisher DB is a 300+ GIG OLTP database.

    Please let me know if any additional information is required to answer my question.

    Problem Simplified:

    I need to always have recent (-20 days and forward) data rows loaded in my replicated DB. Applications are only concerned with data -20 days old and newer.

    I am replicating these type tables using a sliding window method.

    I have a filter setup up something like this:

    SELECT published_columns FROM [dbo].[SampleTable] WHERE [SampleDate] > DATEADD(DAY,-20, GETDATE())

    I have millions of rows in these tables and the object is to keep these replicated tables as compact as possible for numerous reasons such as minimizing IO, keep replication DB compact, have enough memory to cache the entire replicated DB, etc...

    Using this method I have found that as the older rows in the replication DB slide out of the replication window of - 20 days, changes will no longer be propagated to the stale rows and possibilities exist for getting duplicate key row insert when someone updates a stale row with a more current date in the publisher DB.

    To correct the duplicate key row inserts, I have modified the sp_MSins_dboSampleTable stored procedure so that it checks for existence of the row before inserting and if the row exists it simply does a full row update using the sp_MSupd_dboSampleTable SPR. It works but I somehow feel this is a dirty practice as I now have to override the system generated insert procedures.

    Having these stale rows in the replication database presents the possibility of reading data that is not totally in synch with the publisher db. The application that access these replication databases only look at data that is less that 20 days old but I still do not feel that this is right to have out of synch data.

    Options I have considered:

    1 - Run a purge periodically to clean up stale rows.

    2 - Reinitialize the replication database periodically. This could pose performance issues as I have to generate snapshots in the middle of the day etc...

    I am looking for any suggestions or common practices that I might be overlooking. Is there another option I should be considering?

    I am new to replication and I am finding that there is not much education available other than what BOL offers.

    Thank You in advance

  • I didn't get much on this but here is what I have decided to do...

    I have modified the sliding window filters so that the sliding window will always start at the beginning of the day, this way rows in the DB will not go stale until midnight rather frequently throughout the day. Here is an example of one of my new filters:

    SELECT <published_columns>

    FROM [dbo].[SampleTable]

    WHERE [Sample_Date] >= DATEADD(DAY, DATEDIFF(DAY,0,DATEADD(DAY,-20, GETDATE())),0)

    This now sets the groung work for daily purges to run on the replication databases first thing in the morning (12:00:30 AM). The purges will promply delete any rows that fall outside the sliding window.

    In this case, no modifications to the system generated insert,update and delete procedures are required.

    If anyone sees a flaw in my logic or has a better solution please comment

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply