Making sense of mirroring logs proving challenging

  • SQL Server 2008 mirror with witness running on Windows 2008 SE, all virtual servers on different physical hosts. Timeout set to default - 10 seconds. No unplanned/unintended fail over. Review of SQL Server logs finds infrequent SQL Server error code 1479 messages ('The mirroring connection to "<server>" has timed out for database "<database name>"...).

    Sometimes, the failure is bidirectional and a message is logged on both servers. Sometimes, the failure is only one way, and there is no matching message on the other server. Generally timeouts occur 0-3 times a day on a server. Most messages pertain to the witness - failing either with the principal or the mirror. Rarely is there a timeout between principal and mirror. Networking reports that they have no indications of network problems when SQL Server logs show timeout.

    My questions:

    1. Why would there be only a timeout in one direction?

    2. Other than network issues, is there any reason why a live virtual server would fail to respond to what I would think is a fairly important but small communication between servers (e.g., host too busy/overloaded)?

    3. What interpretation for logged timeouts in only one direction?

  • which hypervisor are you using?

    ESX, HyperV,etc

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry,

    We're using Microsoft's VM.

    Steve

  • go into the virtual network manager and check the virtual switch configurations, these switches will uplink to a physical NIC and then go out via a physical switch. You need to check the parts in between all this to verify any network latency.

    what are the response times you have between the VMs and any external machines

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry,

    Thank you! I am very interested in hearing how our engineering folk respond to this suggestion. But the bottom line is clear - the problem is in the network; this approach might possibly also expose other issues unrelated to mirroring, but very relevant to the data center operating environment. It will most likely be at least a few days before I have any results, since broken processes take precedence over 'limping' ones and the database logs are consistent with no unplanned fail overs, so I cannot obtain an escalated priority for this test.

  • Remember, virtual machines bring a lot of flexibility, but they can also bring increased complexity.

    Any virtual machine connected to a virtual switch that requires access to the "outside world" will require that the virtual switch be uplinked to a host physical NIC\NICs.

    You now have a physical network to troubleshoot and a virtual network!

    There are various host counters for monitoring HyperV have a look around MSDN for more info. This should get you started though.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply