150 GB Backup (compressed) taking over 1 day

  • jbrandenburg (11/17/2015)


    I have been working very closely with the storage admin. He has confirmed things appear to be setup properly. The SQL server and the backup server are on the fastest disks we have.

    If 150GB of backups is taking over a day, then things are absolutely not working properly. I would be very concerned about that level of performance because it suggests either a serious configuration problem, or a serious hardware problem. Which means that in addition to your backups taking so long to run, you also risk your backups not being accessible if the problem gets worse.

  • There's lots of good advice already given that should help. I'd check some add'l things.

    1 - Are all the database backups all firing at the same time? If they are, stagger them to launch every 10 minutes or whatever.

    2 - Are you using backup compression? That can really speed up the backup, trading CPU load to reduce network contention.

    3 - Are you doing striped backups?

    4 - 75 databases, 150 gig and simple file recovery? That strikes me as a bit odd and risky, especially if you're only doing a nightly backup. Personally, the only time I do simple file recovery is when it's something that is truly static, like mapping image files for a GIS system, but that's just my preference.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • cphite (11/18/2015)


    jbrandenburg (11/17/2015)


    I have been working very closely with the storage admin. He has confirmed things appear to be setup properly. The SQL server and the backup server are on the fastest disks we have.

    If 150GB of backups is taking over a day, then things are absolutely not working properly. I would be very concerned about that level of performance because it suggests either a serious configuration problem, or a serious hardware problem. Which means that in addition to your backups taking so long to run, you also risk your backups not being accessible if the problem gets worse.

    That's probably the best post on this thread, yet. Even when I was backing up an uncompressed 360GB to a NAS, it wasn't taking even a half day. Something is really wrong and it just hasn't been found yet.

    For example, when we switched to a different type of storage, all of our backups were suddenly taking 4-6 times longer. Everyone said it's because of the "deduplication" being done on the new device and wanted to just live with it. Everyone except me and one of the NetOps guys. After some research and throughput testing, it did, in fact, turn out to be a "too long" cable that also had a kink in it. We replaced the cable with a proper length, unkinked cable and the new device starting doing the backups without any such delays and is actually 25%-50 faster (even with the deduplication and depending on the database) than the original devices that we were backing up to.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (11/18/2015)


    cphite (11/18/2015)


    jbrandenburg (11/17/2015)


    I have been working very closely with the storage admin. He has confirmed things appear to be setup properly. The SQL server and the backup server are on the fastest disks we have.

    If 150GB of backups is taking over a day, then things are absolutely not working properly. I would be very concerned about that level of performance because it suggests either a serious configuration problem, or a serious hardware problem. Which means that in addition to your backups taking so long to run, you also risk your backups not being accessible if the problem gets worse.

    That's probably the best post on this thread, yet. Even when I was backing up an uncompressed 360GB to a NAS, it wasn't taking even a half day. Something is really wrong and it just hasn't been found yet.

    I currently back up two 400GB each from two SQL servers to a NAS that is 12 miles away, over a dedicated data line, and it gets done in around 4-5 hours. We've gotten to around 3.5 hours if we increase the bandwidth of the data line, but that's too costly to maintain. The point is, we get around 85GB/hr mostly because of the data line; locally it's more like 150GB/hr.

    The OP is getting around 6GB/hr on what is, based on his description, essentially a local connection. Same SAN but different drives. This isn't simply bad; it's worrisome.

    For example, when we switched to a different type of storage, all of our backups were suddenly taking 4-6 times longer. Everyone said it's because of the "deduplication" being done on the new device and wanted to just live with it. Everyone except me and one of the NetOps guys. After some research and throughput testing, it did, in fact, turn out to be a "too long" cable that also had a kink in it. We replaced the cable with a proper length, unkinked cable and the new device starting doing the backups without any such delays and is actually 25%-50 faster (even with the deduplication and depending on the database) than the original devices that we were backing up to.

    Good point.

    The storage admin needs to check everything; drives, cables, the overall hardware. This isn't a case where you simply look at the main config page and say "Yeah, that looks about right..."

    This would be like if you took your car into a garage and told them you could only get it to 10MPH, and they give it a quick glance and tell you everything looks okay. No - something is wrong, and it's imperative you find out what. Even if it turns out to be something as simple as a bent cable or a config setting, you need to know.

    Not to pile on, but if the SQL servers and the backups are indeed on the same SAN, even on different disks, you have even more reason to get this resolved. If it is a serious hardware issue that inevitably takes down the SAN, you risk losing your data AND your backups.

  • cphite (11/18/2015)


    Jeff Moden (11/18/2015)


    cphite (11/18/2015)


    jbrandenburg (11/17/2015)


    I have been working very closely with the storage admin. He has confirmed things appear to be setup properly. The SQL server and the backup server are on the fastest disks we have.

    If 150GB of backups is taking over a day, then things are absolutely not working properly. I would be very concerned about that level of performance because it suggests either a serious configuration problem, or a serious hardware problem. Which means that in addition to your backups taking so long to run, you also risk your backups not being accessible if the problem gets worse.

    That's probably the best post on this thread, yet. Even when I was backing up an uncompressed 360GB to a NAS, it wasn't taking even a half day. Something is really wrong and it just hasn't been found yet.

    I currently back up two 400GB each from two SQL servers to a NAS that is 12 miles away, over a dedicated data line, and it gets done in around 4-5 hours. We've gotten to around 3.5 hours if we increase the bandwidth of the data line, but that's too costly to maintain. The point is, we get around 85GB/hr mostly because of the data line; locally it's more like 150GB/hr.

    The OP is getting around 6GB/hr on what is, based on his description, essentially a local connection. Same SAN but different drives. This isn't simply bad; it's worrisome.

    For example, when we switched to a different type of storage, all of our backups were suddenly taking 4-6 times longer. Everyone said it's because of the "deduplication" being done on the new device and wanted to just live with it. Everyone except me and one of the NetOps guys. After some research and throughput testing, it did, in fact, turn out to be a "too long" cable that also had a kink in it. We replaced the cable with a proper length, unkinked cable and the new device starting doing the backups without any such delays and is actually 25%-50 faster (even with the deduplication and depending on the database) than the original devices that we were backing up to.

    Good point.

    The storage admin needs to check everything; drives, cables, the overall hardware. This isn't a case where you simply look at the main config page and say "Yeah, that looks about right..."

    This would be like if you took your car into a garage and told them you could only get it to 10MPH, and they give it a quick glance and tell you everything looks okay. No - something is wrong, and it's imperative you find out what. Even if it turns out to be something as simple as a bent cable or a config setting, you need to know.

    Not to pile on, but if the SQL servers and the backups are indeed on the same SAN, even on different disks, you have even more reason to get this resolved. If it is a serious hardware issue that inevitably takes down the SAN, you risk losing your data AND your backups.

    Our storage admin has begun with disk performance monitoring. As I do not want to throw this over the fence, I would still like to check any and all SQL Application related things that could be causing the delay. The one thing I did notice was that 1 database in particular is taking hours to complete. The database file (mdf) is around 177 GB. After the backup compression runs and the backup completes, the .bak file is down to 12 GB. Is that an expected amount of compression? The particular database is our ODS database for business intelligence. Any thoughts?

  • So far as I know, there is no "expected compression" because it varies by database. Do a free space check on that large DB. A quick way of doing it is to right click the DB, act like you're going to shrink the database and stop before hitting the final OK. It'll tell you how much free space total (logs and all) is in the database.

    You can do the same on the file level, but it'll give you a hard number instead of a percentage.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • If a database consists of mostly binary attachments, do not compress it when backing it up; it will take twice as long and you won't save any space.

    To find out the theoretical fastest backup you can do, backup to nul as follows:

    backup database mydb to disk="nul" with compression;

    It will do an actual backup to the "bit bucket" without doing the writing part to a local disk, SAN, NAS, tape.

  • I've seen 90% compression on databases before, I would not expect a number like that across all databases all the time. And as Bill Taleda said, don't compress if you're storing binary objects in your DB as you'll be wasting time and getting no compression out of it.

    But as Brandie said, there is no "expected compression", it depends entirely on the makeup of the database. Lots of text fields or empty space: higher compression. Binary objects: lower compression. It's kind of like trying to zip a jpeg: it's already compressed, so you're not going to gain very much. So tailor your backup options according to how your databases are built and used. If you need to store binary objects, put them in a different file group, back it up separately, and don't use compression on it.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Just out of curiosity, what is the CPU doing during these backups? Do you see one or more cores at full utilization? If so, it could be that this isn't a storage issue at all, but rather your server is maxed out in terms of processor usage. Also, how are your RAM levels?

    If you don't already have a monitoring tool, try installing something like SQL Monitor - you can install a free trial, and it's a good thing to have regardless. That will allow you to track performance metrics over time, not just CPU and RAM but disk latency, read and write times, etc.

  • jbrandenburg (11/16/2015)


    The backups are being taken to another virtual server on the SAN.

    across the network?

    jbrandenburg (11/16/2015)


    We have dedicated drives for backups.

    dedicated drives on a virtual server?

    jbrandenburg (11/16/2015)


    It appears we have I/O issues

    that much is obvious, as Gail said backups are I\O intensive, reading source and writing the destination

    jbrandenburg (11/16/2015)


    We have a 10GB connection between the SQL Server and the Backup Server.

    Indicates network bound backups

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • If possible, you should really make your networks local first, then move them across the network (copy) when the backup is completed. It will take less time to make the initial backup if you do them locally.

    Granted, I know that's not always possible, but if you can, do it.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Our storage admin has begun with disk performance monitoring. As I do not want to throw this over the fence, I would still like to check any and all SQL Application related things that could be causing the delay. The one thing I did notice was that 1 database in particular is taking hours to complete. The database file (mdf) is around 177 GB. After the backup compression runs and the backup completes, the .bak file is down to 12 GB. Is that an expected amount of compression? The particular database is our ODS database for business intelligence. Any thoughts?

    Aside from the good points that have already been mentioned, I just wanted to ask if the ODS is used for reporting or is there a separate data warehouse with historical information of the same grain? If so, does the 177 GB come from having historical information you may not really need?

    ----------------------------------------------------

  • Hi all-

    Thank you very much for all of the feedback. Thanks to your suggestions, our storage admin was able to find that we have a misconfigured host to san connection. We were able to move it off of the 'bad' host until we know it is resolved.

    Thanks again!

    Jesse

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply