Archive Process - Need opinion

Question

Post reply

Archive Process - Need opinion

Carol Adams

SSCrazy

Points: 2320
More actions
June 18, 2008 at 11:02 am

#187304

We have recently set up what we call an "archiving" process.

There are 2 databases, production and reporting(read-only).

The production database contains 2 years of current data, and this is the database our users are using on a daily basis.

The reporting database contains "ALL" data, which is about 6 years worth.

We have a process in place that runs at night. It takes our fixed/static tables, and copies them from production to the reporting database.

Then we read our transactional tables, on production and check the create date on the rows, if it within the last 24 hours (since we last ran the process) then the row will be inserted into the reporting database.

Next we run a process that check the "change" date on a row, and if it is again within the past 24 hours, then the process "updates" every column in the row.

My suggestion would be to do a delete and insert instead of the update, but they are saying that delete takes too many resources.

We are having locking issues in our database and our users are getting upset. Our business is 24/7 so we don't have any windows to run this process other than overnight, when usage is lower.

Any suggestions would be helpful.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

First, I'd change from using create and update dates to using the timestamp/rowversion data type. Otherwise, if the process ever fails one night (power outage or whatever), it can catch up the next night, without having to rewrite the code.

Then, I'd do an insert and an update, both single-statement per table (2 statements total per table).

One thing you can often do to speed this kind of thing up, is make sure any indexes on the archive tables are disabled (non-clustered) or are based on something like "Date Archived" (clustered). That will make the whole process run faster, in most cases. (Of course, enable the non-clustered indexes after you finish the insert-update.)

Don't move the whole tables around. If you have to move anything at all, just move the new/updated records.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 2

I tend to like the UPDATE rather than the DELETE / INSERT. It is not that it would be faster (it probably will not be) but it really represents what is happening and lends itself to having referential integrity.

You have lots of ways to help things. First, updating every field is not necessary if only one field changed. If you update every field, it will impact every index. If you only update fields that actually changed, you have a better chance of not impacting users. This does, however make updates a bit more complicated because you will need at least one update statement per field in each table.

If the process you are running nightly does one massive update on all of the changed records, you may simply be escalating locks too much. So, you may want to update in smaller batches to reduce the length of your locks and also to reduce the overall number of resources being locked at any one time. This drags you down the road of rollbacks and transactions because you have to decide what to do if your 10 batch update fails on batch 7, but software development is complicated and that's why we get paid well.

In addition, SNAPSHOT ISOLATION is something that you may want to consider. It is very well suited to reporting systems that are being updated while users are running reports because it will show the last committed version of records and allow the updates to continue without blocking. This may run you into other issues, but for your problem, it is something you should really research and understand.

Finally, the update process could also be modified to update a second database and you could kick users out and swap the database names. This solution would reduce the update window to a very small amount of time. It does mean there MUST be a little down time, but the down time will become less than a second and there would be zero resource contention from the update at any other time. You could even write some code to let your changeover window slide until there are no users connected or you reach a final time. There are lots of options close to this using views, database snapshots, and synonyms, but they all end up similar.

noeld SSC Guru Points: 96590 More actions · Answer 3

noeld

SSC Guru

Points: 96590

June 18, 2008 at 11:55 am

#830669

You could also see if your tables could benefit from table partitioning.

* Noel

Carol Adams SSCrazy Points: 2320 More actions · Answer 4

Carol Adams

SSCrazy

Points: 2320

June 18, 2008 at 12:27 pm

#830688

Thank you for your replies, I am going to do some homework on both your suggestions and present something to the other staff members based on those suggestions.

Carol Adams SSCrazy Points: 2320 More actions · Answer 5

Carol Adams

SSCrazy

Points: 2320

June 18, 2008 at 12:29 pm

#830689

Noel,

They do have the reporting database partitioned, but I believe it is partitioned by day (24 hour).

We were originally going to partition by year, but it didn't work out.

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 6

Michael Earl-395764

SSC Guru

Points: 53873

June 18, 2008 at 1:33 pm

#830726

One partition per day? Having 700 partitions (one per day for two years) for any single table in your database is probably not really helping performance issues.

noeld SSC Guru Points: 96590 More actions · Answer 7

noeld

SSC Guru

Points: 96590

June 18, 2008 at 1:43 pm

#830731

Michael Earl (6/18/2008)

One partition per day? Having 700 partitions (one per day for two years) for any single table in your database is probably not really helping performance issues.

Probably but the limit is still 1000 and they may improve manageability with the SWITCH statement (very fast)

* Noel