RESTORE VERIFY ONLY

  • I have a backup regime that creates a backup on a remote server and then runs a verify. This solution has been working without issue for many months. Recently the backups are reporting as failed but are actually succeeding, it's the verify step that is failing.

    A quick rundown on architecture of the system is it's a 3 node AOAG, a primary and two secondaries, one secondary in synch commit and the other in a-synch commit. The backup runs on the primary.

    After some testing I determined running the verify on anything other than the a-synch secondary causes the CPU spike and the job to terminate. When run on the primary node I see the CPU spike and almost all transactions waiting on the hadr_sync_commit wait type. I assume what's happening here is thread exhaustion, the primary node is waiting on acknowledgement from the synch commit secondary node that the transactions have been committed? 

    Can anyone shed any light on why running a RESTORE VERIFY ONLY causes the CPU to spike now, when this has never occurred before, on the primary and synch commit secondary? But not the a-synch secondary.

    PS: I have checked config settings, (sp_configure), process affinity settings etc. all three instance have the same settings.
    PPS: All three nodes have the same hardware and the same SAN backend.

    Thanks in advance to anyone that may be able to assist me with this matter 🙂

  • There are a good bit of articles out there which state that for Synchronous Always-On, the server is likely to take a performance hit due to the confirmation it requires for commits, where the asynchronous will not.  For your issue, the CPU spikes can be from lack of memory which is usually not the underlying cause, meaning something has changed possibly as low as at the database level to cause the spike.  So, is anything running, from the agent or windows scheduler, like a backup or any type of maintenance job on the instance the same time the verify is occurring?  Are the Max Memory instance settings on the Primary and Secondary exactly the same?  You could run a wait stats query during the verify to see what exactly Sql Server is waiting on at that exact time of the spike.  You could also run a script, like whoisactive or sp_who2 to see what is running during the verify and what is and how much cpu is being used.  As far as why this just started to occur, you can question what changed in your environment as a starting point.  Also, is anything being written to the Sql Server error log during the time of the spike?

  • ReamerXXVI - Tuesday, August 29, 2017 10:15 PM

    I have a backup regime that creates a backup on a remote server and then runs a verify. This solution has been working without issue for many months. Recently the backups are reporting as failed but are actually succeeding, it's the verify step that is failing.

    A quick rundown on architecture of the system is it's a 3 node AOAG, a primary and two secondaries, one secondary in synch commit and the other in a-synch commit. The backup runs on the primary.

    After some testing I determined running the verify on anything other than the a-synch secondary causes the CPU spike and the job to terminate. When run on the primary node I see the CPU spike and almost all transactions waiting on the hadr_sync_commit wait type. I assume what's happening here is thread exhaustion, the primary node is waiting on acknowledgement from the synch commit secondary node that the transactions have been committed? 

    Can anyone shed any light on why running a RESTORE VERIFY ONLY causes the CPU to spike now, when this has never occurred before, on the primary and synch commit secondary? But not the a-synch secondary.

    PS: I have checked config settings, (sp_configure), process affinity settings etc. all three instance have the same settings.
    PPS: All three nodes have the same hardware and the same SAN backend.

    Thanks in advance to anyone that may be able to assist me with this matter 🙂

    How about doing a consistency check ?

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • FYI - We finally resolved this, the solution was an upgrade of VMWare tools. Go figure hey.

  • This was removed by the editor as SPAM

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply