Archiving

  • National Archives

    I saw an article that many storage and backup vendors are looking to start including database archiving in their products. If you've ever had to work on archiving data, then this could be a welcome tool to the enterprise. Especially if it can take a load off the DBA and put it on the storage and backup folks.

    Archiving can be difficult to achieve, however, with the complex relationships that we keep on our data and with the many schema changes that we may go through over time. I know I've built homegrown archiving solutions, which often entail moving data to other databases or filegroups and making the data accessible through a view that allows users to query older data. Or maybe just export the data as text files that can be easily loaded back into the database if needed.

    Even if you can easily solve the technical challenges, often you run into business challenges because no one wants to remove data from the system. Especially if there's a chance that a high level executive might query it.

    Just like we should all write detailed specifications, get all business rules up front, and many other good software practices, we should have an archive plan built into any application that expects to gather large amounts of data. However life moves fast and often we just don't have the time to do proper development of the core features, much less build in archiving.

    I think this makes some sense for standard applications: ERP systems, CRM systems, accounting systems, etc. that hold data in known formats and the relationships are relatively static. And it's easy to separate the data into well known periods for archiving specific sets of data.

    Maybe it's a moot point since I've almost never been asked to recover archived data from backup tapes or files. Might be a problem not worth solving for DBAs.

  • Much of this activity is required to meet legal requirements. The legal test is purposefully vague, usually it is some variation of the reasonable person test. Would a reasonable person have implemented a process in this way.

    The cost of non compliance is severe - the other side gets to state its case to the court without having to prove it.

    The data doesn't have to be recoverable to avoid sanction, just implemented with dilegence.

    As for the the technical side of archive, from a cost benefit view, and there really isn't a whole lot of non legal practical benefit beyond a few variations, I have read good things about properly sized tape libraries. Any one have experience with that?

     

  • This issue is a pet peeve of mine with several of our vendor supplied applications.  I'm talking about transactional systems and DSS systems where 95% of what they do daily uses the past 12 month's data.

    The missing functionality to archive data affects the performance of systems as well as increases the cost of support. This includes having to throw more hardware at a bloated database and/or having to constantly manage and monitor the bloat.

    Also, the longer the situation is allowed to continue, the more difficult the archiving process becomes.  What could have been an ounce of prevention, now requires a pound of cure.

    As long as the archived data is available somewhere to be retrieved, it can always be merged back with the active data to fulfill the once in a blue moon query that some exec requires or some legal/regulatory situation.  Then it's not clogging the system that folks need to get everyday work done.  Also, as Steve mentions "I've almost never been asked to recover archived data from backup".  So chances are you're never going to need that 3 year old stuff anyway, and the world changes so quickly now that the 5 year old stuff is probably irrelevant.

  • I think it's probably irrelevant, but you definitely need to keep track of it. I have had to pull old data for legal reasons, though drop it in it's own db. A good reason to keep around a few good backups so you have the schemas you might need.

    Never know when an upgrade will render your easy bcp import useless.

  • Accounting and HR type data retention requirements are 'a breeze' - usually 5-7 years. The reason I say 'a breeze' is because I am now involved in the healthcare industry. The systems data related to patients and their diagnosis have retention that are kind of like a NULL value - unquantifiable or forever. Imagine an imaging systems (x-ray, MRI, CAT scans, PET scans, Mammography) that accumulate 1 or 2 Tb a year for a relatively small provider (250 beds) that has to be kept around for the next 30-50 years ? It is not a joke, it is reality. Many of the systems in place do not have archival and the vendors claim that they are looking into adding archiving in the next 2-5 years !

    My advice is to by stock in storage !

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply