log is full due to Availability Replica

  • We had log full issue on primary replica and log was waiting on Availabilit _replica. When checked the Asynchronous DR Replica was not synchronising. Top priority ticket was raised so we didn't had time to troubleshoot the full cause. Unfortunately our monitoring btool generated the alert very late.

    To fix the issue we have removed the DR Replica which was not synchronising band triggered log backup on primary and issue was fixed and Application was able to access the DB.

    Please help in finding out the root cause as why DR node which is Asynchronous is affecting the primary replica and also we observed some errors in primary replica related to DB mirroring End point relatesd to DR Replica.

    One difference we observe from previous was we have done a fail over of replica during monthly patching window and owner of hadr endpoint is different on DR Replica than that of primary and secondary, also the service account on DR node is running on NT service but primary is running on service account which is also owner of hadr endpoint on primary replica.

    Please help in finding out root cause.

  • rkrpat (10/18/2016)


    We had log full issue on primary replica and log was waiting on Availabilit _replica. When checked the Asynchronous DR Replica was not synchronising. Top priority ticket was raised so we didn't had time to troubleshoot the full cause. Unfortunately our monitoring btool generated the alert very late.

    To fix the issue we have removed the DR Replica which was not synchronising band triggered log backup on primary and issue was fixed and Application was able to access the DB.

    Please help in finding out the root cause as why DR node which is Asynchronous is affecting the primary replica and also we observed some errors in primary replica related to DB mirroring End point relatesd to DR Replica.

    One difference we observe from previous was we have done a fail over of replica during monthly patching window and owner of hadr endpoint is different on DR Replica than that of primary and secondary, also the service account on DR node is running on NT service but primary is running on service account which is also owner of hadr endpoint on primary replica.

    Please help in finding out root cause.

    Take a step back and formulate the question as you were talking to someone that doesn't know anything about your system. Alternatively find a consultant in your area that can help you resolving the problem.

    😎

  • Thanks for your suggestion... thought anyone could give an idea how and why AG replica goes to Not synchronising state in general.. getting consultant help is already done.. but was curious to know before that...

  • rkrpat (10/18/2016)


    Thanks for your suggestion... thought anyone could give an idea how and why AG replica goes to Not synchronising state in general.. getting consultant help is already done.. but was curious to know before that...

    Possibly an unhealthy state of a database replica to name one.

    😎

  • rkrpat (10/18/2016)


    Please help in finding out the root cause as why DR node which is Asynchronous is affecting the primary replica

    There are two things that could be causing your issue. The first is: Are you taking log backups of your primary AG database? If not, then expect it to fill up and not clear.

    The second is: The data in the primary won't commit / clear out of the log if the secondary node is in Asynchronous mode and is not currently in a synchronized state.

    Anything could cause the secondary to not be synchronized. Things like taking differential database backups of the primary, which will disrupt the flow of data from primary to secondary. (AG, for some reason, doesn't like differential backups. I think because it breaks the log chain, but I could be wrong on that reason.). Or the secondary being on a corrupted hard drive / SAN. Or a corrupted page / extent. Anything, really.

    But your log filled up because the secondary wasn't synchronizing on a timely basis or you weren't taking transaction log backups. You need to figure out which one it was. If the former, then you have further troubleshooting to do. If the later, the solution is simply to start taking transaction log backups.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Anything could cause the secondary to not be synchronized. Things like taking differential database backups of the primary, which will disrupt the flow of data from primary to secondary. (AG, for some reason, doesn't like differential backups. I think because it breaks the log chain, but I could be wrong on that reason.). Or the secondary being on a corrupted hard drive / SAN. Or a corrupted page / extent. Anything, really.

    Can you offer more about the details on why differential backups don't work well on the primary or "disrupt the flow of data from primary to secondary?" I can't find much detail on that.

  • patrickmcginnis59 10839 (10/19/2016)


    Anything could cause the secondary to not be synchronized. Things like taking differential database backups of the primary, which will disrupt the flow of data from primary to secondary. (AG, for some reason, doesn't like differential backups. I think because it breaks the log chain, but I could be wrong on that reason.). Or the secondary being on a corrupted hard drive / SAN. Or a corrupted page / extent. Anything, really.

    Can you offer more about the details on why differential backups don't work well on the primary or "disrupt the flow of data from primary to secondary?" I can't find much detail on that.

    One of my vendors mentioned it in a conversation when we were getting set up. And then I found it on MS's website here.

    Given that we only have AG in Production, it's not something I can test right now to verify exactly what it was we found a few months ago.

    Back to the log drive filling up, though. Funny coincidence. I ran a purge of a log table on our primary this morning, only I cut too deep. It filled up TempDB on ALL servers hosting the AG and nearly brought everything to a stand still. So yes, doing activity on the primary will very much affect all secondary replicas. Whether it's affecting tempdb or the replica db log file is a different matter. But I have known a full tempdb to cause a user db query to spring back with a message that the user db log file was full instead of correctly identifying the issue as a tempdb problem.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • errors in primary replica related to DB mirroring End point relatesd to DR Replica

    service account on DR node is running on NT service

    Note from Microsoft: "If you run SQL Server under a non-domain account, you must use certificates"

    Are you sure that the endpoints are correctly configured? Is this the original configuration? Any possibility that the AAG was set up using domain account on all replica nodes but at some point in time the DR site was changed to NT service?

    primary replica and log was waiting on Availabilit _replica

    Having asynchronous node means that the transactions can commit on the primary without waiting for the secondary, but the pages still need to get processed on the secondary in order for the transaction log to "truncate" (reuse the pages), the data has to get to all of the secondary nodes and the translog must be backed up.

    references:

    https://msdn.microsoft.com/en-us/library/ff878487(v=sql.110).aspx

    https://msdn.microsoft.com/en-us/library/ms366346(v=sql.110).aspx

    https://msdn.microsoft.com/en-us/library/ms191477(v=sql.110).aspx

  • Brandie Tarvin (10/19/2016)


    patrickmcginnis59 10839 (10/19/2016)


    Anything could cause the secondary to not be synchronized. Things like taking differential database backups of the primary, which will disrupt the flow of data from primary to secondary. (AG, for some reason, doesn't like differential backups. I think because it breaks the log chain, but I could be wrong on that reason.). Or the secondary being on a corrupted hard drive / SAN. Or a corrupted page / extent. Anything, really.

    Can you offer more about the details on why differential backups don't work well on the primary or "disrupt the flow of data from primary to secondary?" I can't find much detail on that.

    One of my vendors mentioned it in a conversation when we were getting set up. And then I found it on MS's website here.

    This page talks about differential backups taken on the secondary replica, not the primary.

    Given that we only have AG in Production, it's not something I can test right now to verify exactly what it was we found a few months ago.

    The reason I'm asking is that we DO run diffs on our primary and thus my concern for any problems that might arise. Don't mean to be stepping on any thread business tho, so if there aren't any problems with diffs on primary replicas then don't let me interrupt the thread!

  • Brandie Tarvin (10/19/2016)


    Are you taking log backups of your primary AG database? If not, then expect it to fill up and not clear.

    you can offload log backups to secondaries, they don't have to be done at the primary

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • rkrpat (10/18/2016)


    Thanks for your suggestion... thought anyone could give an idea how and why AG replica goes to Not synchronising state in general.. getting consultant help is already done.. but was curious to know before that...

    what do your sql server error logs and application event logs show

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • patrickmcginnis59 10839 (10/19/2016)


    Brandie Tarvin (10/19/2016)


    patrickmcginnis59 10839 (10/19/2016)


    Anything could cause the secondary to not be synchronized. Things like taking differential database backups of the primary, which will disrupt the flow of data from primary to secondary. (AG, for some reason, doesn't like differential backups. I think because it breaks the log chain, but I could be wrong on that reason.). Or the secondary being on a corrupted hard drive / SAN. Or a corrupted page / extent. Anything, really.

    Can you offer more about the details on why differential backups don't work well on the primary or "disrupt the flow of data from primary to secondary?" I can't find much detail on that.

    One of my vendors mentioned it in a conversation when we were getting set up. And then I found it on MS's website here.

    This page talks about differential backups taken on the secondary replica, not the primary.

    Given that we only have AG in Production, it's not something I can test right now to verify exactly what it was we found a few months ago.

    The reason I'm asking is that we DO run diffs on our primary and thus my concern for any problems that might arise. Don't mean to be stepping on any thread business tho, so if there aren't any problems with diffs on primary replicas then don't let me interrupt the thread!

    I'm sorry I can't help more on this. But it was a big deal according to our vendor and ISTR some issue with trying to set something up when we were first installing the groups (before we went live). Unfortunately, as I said, I can't replicate the issue because our only AGs are on Production I don't have any non-prod clusters.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Perry Whittle (10/19/2016)


    Brandie Tarvin (10/19/2016)


    Are you taking log backups of your primary AG database? If not, then expect it to fill up and not clear.

    you can offload log backups to secondaries, they don't have to be done at the primary

    But the question remains. Are log backups being done at all?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply