SQL High Availability Options

  • Hello everyone...

    I am trying to find out how to best configure a new SQL database for an application we are deploying. The vendor does not have a lot of SQL resources to help assist in setting up the SQL side for a client of our size.

    We are projected to eventually have a 2 TB SQL database once all locations are online and fully utilizing the new system. We are in the planning stages and I am trying to find out how we can best prepare for the setup.

    Proposed Infrastructure:

    2 VMWare virtualized Windows 2008 guests on two separate hosts servers with SQL Standard 2008 installed. My research has shown that an Active/Passive cluster is a good choice because if node 1 goes down a duplicate of the database is ready and node 2 takes over. The thing with that is I was told that there is a boot tme or delay for node 2 to take over and it is not instant and possible transaction loss.

    I have not come up with the virtual specs yet as far as how much RAM per host, but I think we will have two HBA's per host connected to a SAN. So I don't think File IO will be an issue. We will have two physical NIC's for each guest on the separate hosts.

    My goal is to setup the most efficient and available SQL system possible. I am looking for suggestions and possible web resources that may help me figure out the best setup method. Is it SQL mirroring, replication, Active/Active, Active/Passive, Failover, cluster, etc. There are so many things and I cannot figure out where to start.

  • First, if you haven't already spent your money on VMWare, may I suggest looking at MS Hyper-V? It is free with Windows Server 2008 and will reduce your costs. Second, if your SAN as the disk capacity, you may want to look at database mirroring. The failover will be faster, but it requires twice the disk space. A third virtual server would be needed to serve as a witness server between the other virtual servers.

  • Firstly, I think that your information on clustering is not quite right. There is no loss of transactions that have been committed. Transactions that are in progress when a failure occurs are definitely lost - this is no different to the normal operation of a non-clustered server (e.g. if power is lost to the server). When a failure occurs, the disks used by the database server are "moved" to the standby node in for the server and SQL Server is start. The normal recovery process then run. This is where the delays are. First, the cluster needs to notice that the failure has ocurred (by default this takes 30 seconds but can be changed) and second, the instance of SQL Server needs to be started on another node. So depending on how long it takes for recovery to run in each database, the time required to return to normal service will vary. When a failover occurs, as far as your applications are concerned, all that happened is the server stopped and then started again since everything for the instance of SQL Server (right down to the IP address associated with it) move to the standby server.

    Database mirroring can be quicker when a failure occurs because the mirrored database is on an instance of SQL Server which is running and is constantly applying (restoring) the transaction logs from the active database. From a SQL Server perspective, the failover time is likely to be a few seconds. From the perspective of your application, things are a little trickier than for clustering. All that happens is that the mirrored database is now on another server (instance of SQL Server actually). Your application needs to know about the other server. If the application is a current .Net application or is using SNAC, there are additional properties on the connection string that deal with the name of the mirror server. The application will see a broken connection, any active tranction will be lost and it can then immediately connect to the standby server.

    If database mirroring is configured with a witness server (which needs to run any edition of SQL from Express upwards), the failover can be automatic (depending on the mirroring model you use)

    Replication is not a good option for high availability because the transition to the replica is not automatic. A DBA would need to intervene to make sure that everything works smoothly (and that also applies when the failed server is restarted). With replication, there is definitely a risk that transactions may be lost because it is a store and forward architecture. The storing of the transactionsi (in the distribution database) may work, but there is the potential that they won't be forwarded to the replica.

    As for Active/Active cluster, Active/Passive cluster - this is really up to you. Part of the decision is cost - MS does not charge for software on a standby server IF it is purely used for that purpose. Hence, the passive server in an Active/Passive cluster does not attract addition license fees. From a performance perspective, there are advantanges in using more than one node of the cluster since you, in theory, have a higher maximum capacity. However, you still need to make sure that each of the servers can handle the load without the other operating. When a failover occurs on an Active/Passive cluster, you only need to restart the service from the failed server - this may be quicker than restarting all (plus the database recovery time). You need to weigh this up when deciding how you wish to configure the cluster. If you are using a single database, then you really are committed to Active/PAssive.

    I have no real information to add wrt virtual machines (VMWare or MS) Vs physical although I tend to prefer physical since there is no additional abstraction between the physical disks and SQL Server.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply