Upgrading ~250 databases to SQL 2017.

  • Hello All,

    I am looking to upgrade our SQL 2012 HADR Availability Group (3 servers) to SQL 2017. On a Test system all went well but a concern was found. I had 6 - 80 GB and 1 - 750 GB databases running. On failover after the upgrade it took approximately 10 minutes to upgrade the databases. The same amount of time was noticed on all three failovers. Our SaaS rows that will be upgraded contains ~250 databases and most are 300 GB - 1.5 TB. I extrapolated that it would take 18 hours just to failover three times. Does this seem right? An in-place upgrade is not my first choice and I am being asked to determine if we can make it work.

    Charlie

  • Can you outline your upgrade process a bit more? Why do you need to failover 3 times?

    If you're performing in-place upgrades, you should be upgrading your secondary instances first, which will remain in sync data-wise but the databases won't be upgraded until you recover one of them by failing over to it. Once all your secondaries are upgraded, failover once to a synchronous secondary (which will upgrade the DB), and upgrade your primary.

    You can either leave running on the sync secondary, or failback as needed, but failing it back to the original primary won't require an additional upgrade step. Check this article for a thorough breakdown of the process.

    You can also use a variation on this process to do a rolling migration. Build two new SQL 2017 instances and join to your AG and sync your databases (evict one of your 2012 replicas as well). Once you failover to a 2017 replica, your DBs are upgraded and can no longer sync back to 2012. You should have at least two working replicas at this point and can evict your 2012 replicas and add a 3rd 2017 replica. For rollback, simply remove the DBs from the AG, fail the AG back to a 2012 instance, recover your databases on that 2012 instance and resynchronise your databases in the AG for another attempt later on.

    As to the time taken to upgrade, unfortunately it takes as long as it takes. I would be curious in your tests if the time spent was in upgrading or recovery. It could be that you have large log files or high VLF counts which can contribute to long recovery times. This may appear as though your upgrade took excessive time, but it may simply be your recovery time.

    How long does a non-upgrade failover take?

  • HandyD - Tuesday, March 5, 2019 9:54 PM

    Can you outline your upgrade process a bit more? Why do you need to failover 3 times?

    If you're performing in-place upgrades, you should be upgrading your secondary instances first, which will remain in sync data-wise but the databases won't be upgraded until you recover one of them by failing over to it. Once all your secondaries are upgraded, failover once to a synchronous secondary (which will upgrade the DB), and upgrade your primary.

    You can either leave running on the sync secondary, or failback as needed, but failing it back to the original primary won't require an additional upgrade step. Check this article for a thorough breakdown of the process.

    You can also use a variation on this process to do a rolling migration. Build two new SQL 2017 instances and join to your AG and sync your databases (evict one of your 2012 replicas as well). Once you failover to a 2017 replica, your DBs are upgraded and can no longer sync back to 2012. You should have at least two working replicas at this point and can evict your 2012 replicas and add a 3rd 2017 replica. For rollback, simply remove the DBs from the AG, fail the AG back to a 2012 instance, recover your databases on that 2012 instance and resynchronise your databases in the AG for another attempt later on.

    As to the time taken to upgrade, unfortunately it takes as long as it takes. I would be curious in your tests if the time spent was in upgrading or recovery. It could be that you have large log files or high VLF counts which can contribute to long recovery times. This may appear as though your upgrade took excessive time, but it may simply be your recovery time.

    How long does a non-upgrade failover take?

    I based my method on this Microsoft doc: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/upgrading-always-on-availability-group-replica-instances?view=sql-server-2017
    To leave the current primary node as primary the third failover would be going back to it. We would want all of the nodes to be completely upgraded so each has to be failedover to in order to apply upgrades to all user databases (metadata).
    Building 2 additional servers and creating a mixed version AG is one of my chosen methods.
    The upgrade was not terribly long .The issues comes from the fact that we have ~255 databases that will require metadata updates.
    Failover took about 10 minutes each (x 3) with 7 databases. This extrapolates to 18 hours for failovers alone.

  • It seems weird that the offline upgrade occurred on all three nodes. Once the database is upgraded, it's upgraded. The changes made to the database during the upgrade propagate to the secondaries as normal replicated transactions.

    I encourage you to test again with trace flag -T3600, which will write each of the upgrade steps with their starting times into the SQL error log. By comparing the starting time of each step with the starting time of the next step, you can see where the time is spent.
    Those steps shouldn't repeat on failover. 

    Also, be sure you are using the latest patch for SQL2017; improvements to the upgrade process have been rolling out with CUs.
    There are several things that can affect upgrade times, but the first thing I would check is the count of indexes. Each index gets examined during the upgrade. If you have tables and indexes that can be dropped prior to upgrade, you will speed up the process. 

     We would want all of the nodes to be completely upgraded so each has to be failedover to in order to apply upgrades to all user databases


    This is not necessary. AG replication will upgrade the secondary user databases. Upgrade steps change the database the same way user transactions do. You only need to fail back for infrastructure reasons - distributing workload/instances across nodes, moving workload to a preferred server, and so on.

    After the first failover to an upgraded instance, you should begin your post-upgrade steps (changing the compatibility level, resampling all stats, kick off a full backup, etc.). Those changes will replicate to the secondaries as well, and only need to be done once.

    -Eddie

    Eddie Wuerch
    MCM: SQL

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply