Failover Clustering

  • I understand that there are 2 types of clustering: Server Cluster and Network Load Balancing. I would like to not have a single point of failure in the event of a hard disk failure.

    In a "Shared Cluster Disk Array", which I understand to be the crux of a failover, do both types use this and in either case does each server have its own hard drives?

    I am aware of the term "Shared Nothing" with regard to this technology but not sure where it fits in. It would seem like this allows for the physically separate disks. true?

    If my goal is for server and hard disk failure protection in a SAN environment (one server in the SAN and one server detached) what would my best option be?

  • Clustering of SQL Server instances is really providing redundancy for the SQL Server instance itself, not the data. Both servers physically are able to access the disks BUT the clustering software only actually permits one server to access the disks at a time. When clustered, you must have disks (SCSI or SAN) that both servers can be connected to at the same time. SQL and the clustering software looks after the rest.

    If you need to provide data redundancy, then you can easily do this by configuring the disks to using one of the RAID options.

    Share nothing means that one and only one server can access a resource (e.g. a disk) at any point in time. In the case of SQL Server, this means that one and only one node in the cluster will be able to access the databases for a SQL Server instance at any point in time (and typically this means until either the instance crashes or is manually failed over to another node in the cluster). This is a fairly simply view on Shared Nothing - have a look BOL for more info.

    Network load balancing is a very different animal. One possible use for it is to provide some level of redundancy. It does not really care able the disks, it really is only concerned about the servers. If you use this to provide redundancy of data between server, you will need to implement a strategy to ensure that updates (that may be applied to any of the servers the WLBS is managing) are applied to all servers. This may require some changes to your applications depending on how they manage connections to the server.

    Clustering is probably easier to manage because there is no need to ensure that updates to one server are correctly applied to other servers. Two or more servers must be configured to be able to connect to the SAN.

  • Thank you for your reply.

    To clarify, in a cluster - 2 servers would access the same disks just not at the same time. When I read about Shared Nothing it describes it as "each server having its own disk". Does this just mean its own disk within the shared disk cluster, which is actually shared among the servers in the cluster?

    Is it true that Network Load Balancing shares disks as well? If this is the case why would somebody not want to do Network Load Balancing over clustering. It would seem like the 2nd server would get more use. Does clustering failover quicker?

    To have redundancy in the disk something like Log Shipping would need to be done, right?

  • SQL Server supports fail-over clustering, it doesn't support network load balancing (imagine trying to write data pages to disk from two nodes at the same time). Hence the reason you wouldn't use it for SQL Server.

    If you're using a SAN, the storage should be constructed in a such a way where redundancy is built-in. For instance, you should be able to lose an entire shelf and you should be okay.

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    K. Brian Kelley
    @kbriankelley

  • Thanks bkelley. Cool website too http://www.warpdrivedesign.org.

    Can log shipping be implemented with clustering? How does this work?

    Is there any documents out there that outline how this works.

  • Log shipping can be implemented with or without clustering. The requirement is SQL Server 2000 Enterprise Edition (if you want it handled by SQL Server), which is the same requirement for clustering SQL Server.

    However, people were doing log shipping in SQL Server 7.0 so you can build your own solution using SQL Server 2000 Standard Edition, though this isn't recommended. If you're using Standard, I think a couple of the replication experts would point you to transactional merge replication. This might be better answered by Andy Warren, as I know how to do replication but I don't use it robustly where I work.

    The SQL Server 2000 Resource Kit outlines log shipping:

    http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/reskit/sql2000/part4/c1361.asp

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    K. Brian Kelley
    @kbriankelley

  • OK, so Log Shipping can be implemented with or without clustering. So my questions is why could a person not implement clustering alone with the db files not being shared in the MSCS Cluster Group but instead each server in the cluster would have its own local db files? I understand this would be defeating the purpose of failover since the files on the 2nd server would be out of date if Log Shipping were not used. But it would stand that if it could be done like this when using Log Shipping with Clustering that each server in the cluster could been configured to have its own local db files and not share them in the Cluster Group. Is this correct?

  • That is correct, but here are some of the cons. This isn't to say tran log shipping isn't a good high-reliability solution. If it wasn't, no one would have figured out how to put things together in SQL Server 7.0 (and Microsoft wouldn't have implemented in SQL Server 2000 EE).

    With log shipping, one database will always be slightly behind the other. Up to the amount of time between transaction log backups.

    A second catch is you have a read-only solution in the second server, it has to be that way in order to accept the transaction log backups (and you certainly don't want anything written to it, because then there would be no way of getting it back to the first server).

    With tran log shipping, you rely on the reliability of the network. In a pure fail-over clustering solution, you don't. The media is shared, so if the hardware on the first server fails, you come up on the second server. If you have a network outage, your transaction log backups don't make it across. Of course, I'm considering a case where a network outage affects connectivity to the second server but not clients (which does happen).

    Finally, you've got the whole process of bringing the warm standby up as the primary and switching roles if log-shipping is in effect, something you don't worry about in fail-over clustering.

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    K. Brian Kelley
    @kbriankelley

  • One more question-

    In setting up a 2 node fail over cluster, how similar in hardware do the 2 servers need to be? Is it sufficient for the 2 servers to be capable of performing the same level of transactions and have the same drive layout and folder structure or do they need to be identical servers (manufacturer and everything)?

  • The hardware can be dissimilar. You tend to want to get the same servers, but this isn't a requirement (we have a 4/700 and a 4/900 clustered, for instance). As long as the lowest end server is capable of handling the load, that's what you're looking for.

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    K. Brian Kelley
    @kbriankelley

  • Only the databases (not the server) that you log ship are in standby.

    I have a situation where my production databases are standby servers for other production server and then the reverse. The chances of catastrophic failure are so slim I hate to have duplicate boxes sitting there doing nothing. So like Clustering I guess you can say I have a failover solution not 500 feet away but Oceans away. But as Brian mention you're only as current as your last successful transaction log backup that made it to the stand by server.

    quote:


    A second catch is you have a read-only solution in the second server, it has to be that way in order to accept the transaction log backups (and you certainly don't want anything written to it, because then there would be no way of getting it back to the first server).


    John Zacharkan


    John Zacharkan

  • Good catch, John. Database, not server, is in standby. My goodness it has been a LONG week.

    We don't like idle boxes, either. All of our clusters are Active/Active. We figure if we have the hardware, we might as well make use of it. Otherwise, we'd just be buying additional hardware anyway.

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    K. Brian Kelley
    @kbriankelley

  • Oracle 9i real app clusters will allow for this. One feature i wish Microsoft provides as we have requests for NLB too. The user should decide whether to cluster as shared-resource or shared-nothing. Understand that the latter is more scalable and the former has bottlenecks a distributed lock manager has to be implemented across servers!

  • We're hoping Yukon answers a lot of the high-scalability requests. Keep in mind that you have some method of load balancing using partitioned views (see Vyas' article at http://vyaskn.tripod.com/) and that's faster than traditional queries against one server.

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    Edited by - bkelley on 05/09/2003 07:02:55 AM

    K. Brian Kelley
    @kbriankelley

  • quote:


    We're hoping Yukon answers a lot of the high-scalability requests. Keep in mind that you have some method of load balancing using partitioned views (see Vyas' article at http://vyaskn.tripod.com/) and that's faster than traditional queries against one server.

    K. Brian Kelley

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1

    Edited by - bkelley on 05/09/2003 07:02:55 AM


    Yes, i am aware of distributed partitioned views (DPV) and unfortunately i am no fan of that for the following reasons:

    1) DB schemas are not always partitionable. We are just starting and have 50 tables with some tables having 10-15 relationships between them. The tpcc benchmark that MSFT so proudly touts is inherently partitionable.I would like to see a real application like SAP or Peoplesoft or SEIBEL partitioned using DPV. WHy is it that these vendors have not done it ? Our app is very similar to an ERP app.

    2) If i am querying the DPV built federation, and i need rows 50-100 in the 2nd partition, if the first partition containing rows 1-49 fails (server crashes), the entire query will fail. This happens even though the server (partition) containing rows 50-100 is up an available.

    RAC have their drawbacks too :

    http://www.microsoft.com/sql/evaluation/compare/oracle-rac.asp

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply