SQL, VMs, and Backups... Oh my...

  • Ladies and Gentlemen,

    I want to post a question to see what the SQL community is doing to handle backups of SQL Servers running on VM. Like every company, Management loves VM technology for the cost savings. The problem is that they are also selling it as a solution to provide HA options for applications with strict SLAs. We backup the databases via SQL, but the recovery requires a VM to be imaged and the DBA Team manually installing SQL and restoring the databases.

    THis is obviously not the bill of goods sold due to the manual intervention.

    The company has successfully tested software to backup the entire image from a VM perspective and restore the entire image successfully, databases and all, but there is a cost. The VM backup software requires snapshots of all of the disks on the server including the SQL disks, but it is a "freeze and thaw" method, meaning SQL actually stops responding during the backup (vRanger). In the case of one of the SQL servers, the server was not available from a SQL perspective for close to 4 hours causing many nightmares to daily maintenance and nightwork.

    So here is the question: Is anyone using any software to backup the entire VM, SQL and all, that uses VSS or other functionality that would leave SQL functional during the entire VM image backup? Of course as DBAs, we will continue to cover from a SQL aspect, but the the ability to recover entire images with SQL running will definitely help in recovery times.

    Thanks for entertaining me.

  • You shouldn't treat the VM any different to s physical machine. How do you back up these?

    Options such as VMWare vmotion make a virtual machine highly available and all but eliminate hardware failures but traditional back up methods should still be in force for severs such as Exchange and SQL server.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • I tend to be with Perry. There are ways that software can handle the imaging, and 4 hours seems long. But overall, I wouldn't build a VM, and a non-VM process. Use one process.

    A few people I know use SAN snapshots for critical systems, and would recover to PIT using those in SQL and it works pretty well.

    A lot of the advanced techniques depend on the vendor you have and the capabilities of their software. I'd set an RPO/RTO down first, then talk to vendors and see what they can do.

  • I am with same delimma, here are the options we are considering....

    Continue taking sql backups as usual but take them directly to central backup location (CIFS)

    Alternatively veem backup solution offer quick restore from snapshots if you want you can completely avoid SQL backups (this is what we are validating)

    Also you can use something like snapmanager which allows db to be restored in minutes...

    Prakash

    Prakash Heda
    Lead DBA Team - www.sqlfeatures.com
    Video sessions on Performance Tuning and SQL 2012 HA

  • Don't forget you need to continue taking Transaction Log backups of databases in Full and Bulk Logged recovery models, regardless.

    Also note that different objectives, such as PITR (Point in time Recovery) requirements, may require different backup and/or archive methodologies.

  • Thanks for the follow-ups.

    We have successfully tested full VM backups of the entire VM using vRanger. All drives are scheduled for backup due to the downtime incurred by SQL during the process. We even schedule downtime to stabalize SQL. Since these are candidates for running SQL on a VM, we typically can schedule these backups everytime the OS or the software (including SQL) are patched/updated due to the relaxed SLAs.

    These VM restores return the entire server to the state of the VM backup. SQL is started and the backups are restored from the SQL backups saved to tape, full and transactional. The servers return built, configured, and hardened; which takes several hours to accomplish on the physicals.

    We do have one policy for all SQL builds, VM or physical. This policy can be used in a worst case scenario for VMs. All SQL instances are backed up to tape as a standard policy.

    Thanks for the advice on some other possible VM backup solutions.

  • Whenever there's a VM snapshot involved, keep in mind you'll loose any MS support!!

    For details please see http://support.microsoft.com/kb/956893/en-us

    Quote:

    Virtualization Snapshots for Hyper-V or for any virtualization vendor are not supported to use with SQL Server in a virtual machine. It is possible that you may not encounter any problems when using snapshots and SQL Server, but Microsoft will not provide technical support to SQL Server customers for a virtual machine that was restored from a snapshot.

    Btw: That's the reason we refused to virtualize our SQL Servers as of today. The VM folks were unable to tell us how to definitely prevent VM snapshots from happening ("definitely" in terms of a technical solution and not a "we have a written rule that we should not do so...").

    If you don't invest the money for MS support, then it's a different story...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thanks. I will take this up with our MSFT rep. Thanks for the information.

  • The whole point about backups is that you must be able to recover from them. We don't run our prod SQL on VMWare ( or Hyper V ) why would I want to degrade performance?

    A snapshot is worthless for a prod SQL Server, I had the same discussion with our datacentre who were taking snapshots of our prod servers, it goes like this.

    1. A snapshot is point in time - let's say we choose midnight.

    2. Assuming any failure we have is a few seconds after midnight then the snapshot is good.

    3. If my failure is at say 6 am and I recover my snapshot from midnight I've lost 6 hours data.

    4. To get back to 6 a.m. I need to apply the last full backup ( which is good ) and subsequent transaction logs.

    5. Therefore from a purely SQL point of view a VM snapshot as a backup is useless. ( unless your database doesn't change or you can stand the loss of a days data ) because to get back to a point in time you need to apply your backup(s) and logs.

    ( I also run 24 x 7 ) so can't have any outages

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • colin.Leversuch-Roberts (2/15/2012)


    The whole point about backups is that you must be able to recover from them. We don't run our prod SQL on VMWare ( or Hyper V ) why would I want to degrade performance?

    A snapshot is worthless for a prod SQL Server, I had the same discussion with our datacentre who were taking snapshots of our prod servers, it goes like this.

    1. A snapshot is point in time - let's say we choose midnight.

    2. Assuming any failure we have is a few seconds after midnight then the snapshot is good.

    3. If my failure is at say 6 am and I recover my snapshot from midnight I've lost 6 hours data.

    4. To get back to 6 a.m. I need to apply the last full backup ( which is good ) and subsequent transaction logs.

    5. Therefore from a purely SQL point of view a VM snapshot as a backup is useless. ( unless your database doesn't change or you can stand the loss of a days data ) because to get back to a point in time you need to apply your backup(s) and logs.

    ( I also run 24 x 7 ) so can't have any outages

    VM backups (VRanger) serve fundamentally the same purpose as drive images (dd, PING, Clonezilla, Acronis TrueImage, Paragon Drive Backup, Norton Ghost, etc.); they are bare metal restore capable OS + App [ + Data] backups. Nothing inside of SQL Server

    VM Snapshots serve in the same niche as a Simple recovery model database full backup, but for the entire guest OS + apps + data, where the database backup is only for; they are a means of restoring an entire guest to the state it was at (memory included) during the snapshot.

    VM vMotion/Live Migration is somewhat unique, allowing (given sufficient bandwidth) moving virtual guests to different physical hardware while they're running.

    Full/Bulk Logged recovery model full + [ diff + ] tlog backups don't have a VM equivalent; they are PITR ([User selected] Point In Time Restore) capable, transactionally consistent application data backups which also allow transaction log re-use, and thus are required for Full and Bulk Logged recovery model databases regardless of other methods.

    These share some capabilities, though each has unique capabilities.

    VM Snapshots, VM backups, and drive images all let you very quickly recover from OS, application, or filesystem level problems, such as a Windows or SQL Server patch/service pack/CU/hotfix/driver upgrade that causes problems. VM Snapshots are the fastest at this by far; the other two can easily be (and should always be) kept on different physical hardware to allow recovery from crashes.

    VM backups and drive images both let you very quickly recover from a physical hardware failure, even to dissimilar hardware; if your server is stolen, catches on fire, or is fed into a woodchipper, you can very quickly restore it to even dissimilar hardware. Note you get back the entire state; your OS level security, all your Group Policy changes that allow instant file init and memory locking, all your OS patches, all your application service packs, your entire server configuration, all your security, all your encryption keys, your SSRS XML configuration tweaks, that defragging software you bought, your add-on backup compression software, etc. etc.

    VM backups and drive images also let you easily 'clone' SQL servers or other machines (assuming your particular licensing allows for your particular case), with the entire configuration intact (as above).

    vMotion/Live Migration allow you in many cases to separate physical hardware maintenance from database downtime. You want to add more RAM? Change out the network cards? Move to a new physical platform? Replace the motherboard? Send your guest database to another physical box with the capacity while it's life, do the maintenance, send it back.

    Full+tlog backups allow for PITR data recovery, even to a completely different SQL instance; invaluable for 'get back only X data as of exactly 2:14:30 p.m. the Tuesday before last', and allow for transaction log reuse.

    VM backups or drive images plus Full+tlog backups are a very potent and low cost combination for lowering RTO and simplifying recovery in certain cases; say your A/C leaks over the rack your servers are in, shorting everything out. Once you obtain other hardware to use, restore the drive image/install the VM base (VMWare will be considerably faster than Windows Hyper-V) and restore the VM backup, then use your full+tlog to get to a point in time just before the failure.

  • LutzM (2/14/2012)


    The VM folks were unable to tell us how to definitely prevent VM snapshots from happening ("definitely" in terms of a technical solution and not a "we have a written rule that we should not do so...").

    Snapshots generally come from

    user created snapshots

    VMWare VCB backups

    VMWare Data recovery backups

    Don't backup the VMs via VCB or VDR and use Virtual Infrastructure permissions\roles to stop users taking snapshots. You can also modify the vmware tools install and remove the quiescing driver from the VM.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • One thing I didn't see stated in this thread: WHAT changes from hour to hour, day to day? If it is only the database content, then your recovery plan can be to start with restoring a 'bare metal' machine (virtual, right? as in, copy the server file, change some settings, and you have the server up and running?), and then do your database restore. Depending on the size of your database(s), the version of SS you are using (presumably 2008 on this forum), then you will probably need to restore GUID content maintained through a script outside normal backup processes. Steve Jones is right - you need to start with your recovery objectives and content loss tolerances (how long a time of data loss? seconds, minutes, hours, days?).

    Incidentally, if you use mirroring on virtual machines, be careful that all the different instances used in mirroring reside on different physical servers, or you will lose your mirroring capability if the host server goes down.

  • Thank you for that explanations and opinions.

    This post was never meant to be a forum of "SQL Physical Good" or "SQL virtual BAD!" This was trying to make the most of the hand dealt to us. Management, against suggestions from our DBAs and architects, decided that SQL on VMs were more cost effective in SOME cases and now we have to deal with it. What we do is ensure that the application is a valid candidate for VM before it is considered and we still put up the good fight to run on physicals. Most of our VMs are built to standards with close attention paid to the IO back end (Thank you Brent Ozar for your advice. We continually monitor the resources and makes recommendations to migrate to physicals when necessary.

    We are simply trying to find a way to make the most of the VM hand we were dealt. It seems like a complete waste of resources to build a VM server the exact same way the physicals are built. Patching operating systems alone can take hours without even installing and configuring SQL. As stated, we are just looking to see other ways to get a SQL Server running on a VM other than building it from scratch every time a DR test is conducted. Every application VM we have that is not running SQL is restored on the ESX servers in MINUTES. Rebuilding the SQL VMs from scratch takes hours even when scripted. This in NO WAY was ever meant to replace our SQL backup strategy. it was meant to simply get a working SQL Server that we can refresh the databases with the last known good backups saved to tape. We have machines that can recover to the last full backup (simple recovery) and others to within minutes (Full recovery with incrementals and transactional backups) based on the SLAs.

    Just to clarify, restoring a Snapshot backup of the VM housing SQL does get you a working SQL Server in minutes. We then overwrite the existing backups with the backups up until the time of the disaster declaration. It works in concept, although it appears that MSFT will not support it if you have problems (thanks for that information). And Yes, you have to consider any changes to the OS or software that may cause issues with the SQL restores (i.e. Upgraded SQL versions, MSFT hotfixes, etc...).

    Thanks to everyone for your replies and opinions.

  • Allow me to be devils advocate in favor of SQL VM's:)

    I dont see the argument of SQL VM are bad performer stands valid anymore, its just that there are few cases where using bare metal still makes sense, but otherwise its a waste of resources, in todays cost cutting world where everything is moving towards cloud, vm's provides lots of benefits....

    By following best practices VM works almost same or 95% of what baremetal perform while utilizing better resources.

    The way I look at it, unless you really have a 4+ nines of SLA it makes no sense to go with baremetal anymore...

    I worked with Intuit and it was a clear direction to move every baremetal to VM and that was with esxi 4.1 where max CPU was 8, now with 32 that last excuse is also gone....the only case where we did ask for exception was for ecommerce site which constitutes 33% of intuit revenue and got the exception easily approved (no management want to play with top most revenue friendly servers)...everything else was on VM and performance was no issue (ofcource you need good VM admin) .... complete DEV and QA was running on VM and it was really easy to duplicate production env with 1/10th of cost

    using VM backup to replace SQL backup is not a god idea, also as Lutz suggested its not a supported configuration so should be easy to convince management....and in case management decide to go with it, its VM admin job to manage backup....not your headache anymore....(use in-guest iscsi drives and vm restore does not backup db data anymore, os crash is taken care by vm restore and sqlbackup is to be restored only if needed)

    In my current setup at Advent 90% of clients are having 2 nines of SLA thus they are perfect candidate for VM's, we are using centralized backups to save space and better availability....

    If you are using prepare method of sql server 2008 r2 and slipstream than you dont need to spens any time to build manually, just automate new sql vm instances with post installation steps and most of the manual work to prepare sql is done, no need to worry about hardware drivers issue anymore and vmware HA takes care of physical hardware failiure

    ISCSI in guest does not have much of a performance issue unless you are running a very high volume application where RDP disks are recommeneded, and its enough money making application than using bare metal makes sense

    Also patching is simplified as SLA usually does not count against planned maintenance, and snapshot helps roll back in minutes if any issues...this is a huge huge advantage

    I think where you do have to make extra effort is to collect right details to ensure VM's are having right resources needed to perform adequately, and even if its not done expansion is very easy for storage/cpu/memory in virtual world as needed....

    tpalmer (2/15/2012)


    Thank you for that explanations and opinions.

    This post was never meant to be a forum of "SQL Physical Good" or "SQL virtual BAD!" This was trying to make the most of the hand dealt to us. Management, against suggestions from our DBAs and architects, decided that SQL on VMs were more cost effective in SOME cases and now we have to deal with it. What we do is ensure that the application is a valid candidate for VM before it is considered and we still put up the good fight to run on physicals. Most of our VMs are built to standards with close attention paid to the IO back end (Thank you Brent Ozar for your advice. We continually monitor the resources and makes recommendations to migrate to physicals when necessary.

    We are simply trying to find a way to make the most of the VM hand we were dealt. It seems like a complete waste of resources to build a VM server the exact same way the physicals are built. Patching operating systems alone can take hours without even installing and configuring SQL. As stated, we are just looking to see other ways to get a SQL Server running on a VM other than building it from scratch every time a DR test is conducted. Every application VM we have that is not running SQL is restored on the ESX servers in MINUTES. Rebuilding the SQL VMs from scratch takes hours even when scripted. This in NO WAY was ever meant to replace our SQL backup strategy. it was meant to simply get a working SQL Server that we can refresh the databases with the last known good backups saved to tape. We have machines that can recover to the last full backup (simple recovery) and others to within minutes (Full recovery with incrementals and transactional backups) based on the SLAs.

    Just to clarify, restoring a Snapshot backup of the VM housing SQL does get you a working SQL Server in minutes. We then overwrite the existing backups with the backups up until the time of the disaster declaration. It works in concept, although it appears that MSFT will not support it if you have problems (thanks for that information). And Yes, you have to consider any changes to the OS or software that may cause issues with the SQL restores (i.e. Upgraded SQL versions, MSFT hotfixes, etc...).

    Thanks to everyone for your replies and opinions.

    Prakash Heda
    Lead DBA Team - www.sqlfeatures.com
    Video sessions on Performance Tuning and SQL 2012 HA

  • ybz199009 (2/24/2012)


    blabla

    reported as well. There are about 100 of those within the last few hours from the same user. :sick:



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply