Archive Process - Need opinion

  • We have recently set up what we call an "archiving" process.

    There are 2 databases, production and reporting(read-only).

    The production database contains 2 years of current data, and this is the database our users are using on a daily basis.

    The reporting database contains "ALL" data, which is about 6 years worth.

    We have a process in place that runs at night. It takes our fixed/static tables, and copies them from production to the reporting database.

    Then we read our transactional tables, on production and check the create date on the rows, if it within the last 24 hours (since we last ran the process) then the row will be inserted into the reporting database.

    Next we run a process that check the "change" date on a row, and if it is again within the past 24 hours, then the process "updates" every column in the row.

    My suggestion would be to do a delete and insert instead of the update, but they are saying that delete takes too many resources.

    We are having locking issues in our database and our users are getting upset. Our business is 24/7 so we don't have any windows to run this process other than overnight, when usage is lower.

    Any suggestions would be helpful.

  • First, I'd change from using create and update dates to using the timestamp/rowversion data type. Otherwise, if the process ever fails one night (power outage or whatever), it can catch up the next night, without having to rewrite the code.

    Then, I'd do an insert and an update, both single-statement per table (2 statements total per table).

    One thing you can often do to speed this kind of thing up, is make sure any indexes on the archive tables are disabled (non-clustered) or are based on something like "Date Archived" (clustered). That will make the whole process run faster, in most cases. (Of course, enable the non-clustered indexes after you finish the insert-update.)

    Don't move the whole tables around. If you have to move anything at all, just move the new/updated records.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I tend to like the UPDATE rather than the DELETE / INSERT. It is not that it would be faster (it probably will not be) but it really represents what is happening and lends itself to having referential integrity.

    You have lots of ways to help things. First, updating every field is not necessary if only one field changed. If you update every field, it will impact every index. If you only update fields that actually changed, you have a better chance of not impacting users. This does, however make updates a bit more complicated because you will need at least one update statement per field in each table.

    If the process you are running nightly does one massive update on all of the changed records, you may simply be escalating locks too much. So, you may want to update in smaller batches to reduce the length of your locks and also to reduce the overall number of resources being locked at any one time. This drags you down the road of rollbacks and transactions because you have to decide what to do if your 10 batch update fails on batch 7, but software development is complicated and that's why we get paid well.

    In addition, SNAPSHOT ISOLATION is something that you may want to consider. It is very well suited to reporting systems that are being updated while users are running reports because it will show the last committed version of records and allow the updates to continue without blocking. This may run you into other issues, but for your problem, it is something you should really research and understand.

    Finally, the update process could also be modified to update a second database and you could kick users out and swap the database names. This solution would reduce the update window to a very small amount of time. It does mean there MUST be a little down time, but the down time will become less than a second and there would be zero resource contention from the update at any other time. You could even write some code to let your changeover window slide until there are no users connected or you reach a final time. There are lots of options close to this using views, database snapshots, and synonyms, but they all end up similar.

  • You could also see if your tables could benefit from table partitioning.


    * Noel

  • Thank you for your replies, I am going to do some homework on both your suggestions and present something to the other staff members based on those suggestions.

  • Noel,

    They do have the reporting database partitioned, but I believe it is partitioned by day (24 hour).

    We were originally going to partition by year, but it didn't work out.

  • One partition per day? Having 700 partitions (one per day for two years) for any single table in your database is probably not really helping performance issues.

  • Michael Earl (6/18/2008)


    One partition per day? Having 700 partitions (one per day for two years) for any single table in your database is probably not really helping performance issues.

    Probably but the limit is still 1000 and they may improve manageability with the SWITCH statement (very fast)


    * Noel

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply