Backup strategy

  • Hi Everyone,

    I have just started a new job at a small firm and I have been tasked with looking into the company backup strategy. The team does not have a dedicated DBA and only elementary administrative knowledge of SQL Server.

    The backend database is for an application that is used constantly by all 100 staff throughout the business day. I am mindful I need to devise a strategy that could be put into action quickly if a DR occurred within the skillset of the small IT team.

    The database is around 96GB. The current strategy is a mixture of Full, 3 hourly Differential and 10 minute transaction logs backups which they have struggled to restore to point in time when testing DR.

    I am thinking to keep things simple we could perform a 2 hour full backup and half hourly transaction log backups.

    I know it is unusual to have full backups taken so regularly but I know that within the SQL skill set the team would be comfortable restoring a full backup file. Less so with transaction logs so at worse we would lose 2 hours work and at best might lose 30 minutes if we store the transaction logs.

    Does this sound like a reasonable plan?

  • Your starting point should be to look at the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for each database/system. The first means "how quickly should we be able to recover from a DR situation?" and the second means "how much data are we prepared to lose?".

    So in your example if you only take full backups every 2 hours (and no transaction log backups), is losing 2 hours worth of data acceptable? If you also take tran log backups every 30 minutes, is 30 minutes data loss acceptable?

    Why would a team be less comfortable restoring tran log backups than full backups?; it's essentially the same operation, you just need to make sure that the NORECOVERY option is specified when restoring the Full backup and all but the final Tran log backup.

    Having a reliable and proven backup and recovery strategy is one of the most important tasks for a DBA so you are right to be concentrating on this.

    Look in BOL (Books Online) for more details about backing up and restoring databases.

    Regards

    Lempster

  • Yes, it really does depend how much data loss is considered acceptable where you are.

    There is a free eBook available on this very site[/url] that covers backup and restore, so it might be worth downloading that and having a look through it if you have an hour or two.

  • thank you for your reply and guidance.

    The current strategy is based around nightly full backups, 3 hour differential and 10 minute transaction logs. A few months back another job set up outside of SQL Server was created to undertake full backups every hour.

    I think historically the issues have been for 2 reasons,

    1. The differential LSN was broken by the hourly full backups that were taken without the "with copy"

    2. The transaction logs are backed up every 10 minutes, the amount of files and restoring them correctly in sequence if the differentials were not able to be restored caused issues.

    My concern is realistically within the teams skill set what would we be able to quickly restore. I worry that a process that involves many files will actually delay us getting up and running again (for example under the pressure of a disaster situation during the restore someone might forget to select No recovery on one of the transaction log backups and potentially then have to start the restore process again)

  • TerrenceTheCat (2/21/2014)

    1. The differential LSN was broken by the hourly full backups that were taken without the "with copy"

    Has this been resolved now?

    My concern is realistically within the teams skill set what would we be able to quickly restore. I worry that a process that involves many files will actually delay us getting up and running again (for example under the pressure of a disaster situation during the restore someone might forget to select No recovery on one of the transaction log backups and potentially then have to start the restore process again)

    Yes, that's a concern, especially when you're in a hurry.

    Have you tested how long it takes to take a differential backup? Just thinking aloud here, maybe you could take a differential every hour if they don't take too long, and then you'd only need to restore the last full, the latest differential, and any subsequent log backups. If you took a differential on the hour, then log backups at 20 past and 20 to (say), you'd only ever have a maximum of 4 files to restore (with 20 mins potential data loss);

    - last full

    - latest differential (from on the hour)

    - log from 20 past the hour

    - (log from 20 to)

    I think :-D.

  • Well, it doesn't make sense to have a Full backup schedule that is more frequent than your Diff backup shcedule so I can see how that caused problems. I would be very tempted to get rid of that second job.

    With regard to restoring many files (transaction log backup files), ideally you would have it all scripted so there would be no possibility of making a mistake, but if, for whatever reason, you have to restore manually, then taking a few extra seconds/minutes to double-check shouldn't be an issue. Also, you should have your backup and recovery process (your DR process) documented and tested so that (a) you know it works and (b) takes the pressure off in a real DR situtation as you'll know what to do and the expected outcome. Plus you can demonstrate to your manager that you have a solid DR strategy.

    The saying goes that 'if you've never tested your DR strategy, you don't have a DR strategy'! 😉

    Regards

    Lempster

  • I'd say the current setup is OK. If anything, I wouldn't go with lots of full backups because you do pay for that with load on the system. Plus, as has been pointed out several times, you'll lose the ability to do a point in time recovery, so depending on your RPO, that could be extremely problematic. I'd suggest two things, first, talk to the business about how much data they think is acceptable to lose. Of course, they'll say zero, but you need to figure out if 10 minutes (the current log backup frequency) is the acceptable point. Knowing this will help you figure out how to set up your backups. Second, practice. Practice the restores a bunch until you're comfortable. This is your business that we're talking about in this data. Practice restoring so you can get it right when the disaster hits.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Thanks guys for your input.

    At the moment full backups are being taken every hour during business hours to an external drive that can be taken offsite.

    I do like the comfort of full backups just in case there are issues with differentials (perhaps I am being paranoid?)

    I do plan to get a steer from the executive in regard to acceptable potential data loss, recovery time.

    Do you feel it would be unreasonable to do the following...

    • Full back up every two hours

    • Differential every hour in between

    • Transaction log every 20 minutes

    I totally agree with the regular practice idea!

  • Personally I think a Full backup every 2 hours is overkill and like Grant said, it does put load on the system so your users may find their application(s) less reponsive during the backup windows. Have you a feel for how much data changes during a typical day, i.e. how large are your differential backups?

    If you are comfortable with the strategy you've outlined, the discussions with your exec are in agreement with that strategy and the additional load on the system is acceptable then go with that, but always be willing to review and change strategy as your database(s) grow(s). And test, test, test!

    You raise a good point about taking backups offsite; there is no point in agreeing a RTO of 2 hours if it takes 3 hours just to retrieve the backup tape from offsite and believe me, I have worked in places where that was the case....until I found out about it anyway! 😉

    Regards

    Lempster

  • One more suggestion: think about writing a script, perhaps in Powershell, that will take all files in a folder and build the proper recovery scripts. That way you can point the script and get the restores for the full and logs, easily.

    Test this, make sure everyone has a copy, and if possible, keep a copy of the script with a README with the backups.

  • When I started my present position the first question I asked was how does one recover a database, how much time is the business willing to wait for recovery and how much data loss is acceptable.

    I got a lot of blank stares when I pressed for these answers. Bottom line was they really didn't have the answers to these questions, except, we don't want to loose any data if possible.

    So I reviewed the backup procedures and found that there were full backups taken ever 24 hours. A lot of the databases were set with FULL RECOVERY but there weren't any jobs to manage the transaction logs. Every so often one of the network administrators would switch the database over to SIMPLE, truncate the log, then change it back to FULL. :w00t:

    It probably took me 2-3 months to get the overall details but I was able to identify which databases were mission critical and set up transaction log backups every 15 minutes along with a full nightly backup.

    The remaining non-mission critical databases were move to SIMPLE.

    The company was bringing several brand new systems, one of which was mission critical. Based on the information I learned from the other system due-diligence, I set these T-log backups to every 30 minutes to start then moved it to 15 min.

    Prior to going live we did a DR test. I was relying on the network people to fully understand the requirements. Turned out the DR test was a disaster because I wasn't able to get historic transaction logs (they were being purged by the Net Admin people). Not good.......

    I then took it upon myself to write a 7 day archive process that takes all of the backups and associated transaction logs and move them to one of 7 folders.

    This effort was paid off big time. 2 days before go-live one of the users had discovered a "hidden feature" in the application causing a massive delete of some primary data.

    Because I had the previous day's backup and all associated transaction logs, I was able to bring back the database to the last 15 min. interval (which turned out to be only 3 minutes old).

    I restored the database and used a script to apply all of the transaction logs up to the point of failure. We were up in less than 10 minutes.

    I guess the point that I am making here is a lot of times others in the IT department may not have a full grasp of what is needed to properly manage a production database. With that in mind, you as the DBA may have to define the DR recovery plan, put it into place and tweak it as needed, but then document the 2 most important issues when it comes time to recover from a disaster, 1) Time to recover 2) amount of allowable data loss.

    If all is on board with this information it will make your life as a DBA much easier in the event of recovering from a database failure.

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • Thanks Steve, I have been looking for a good opportunity to get into Powershell so this sounds well worth exploring.

    Many thanks

    David

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply