Can replication supply a hot failover?

  • A customer is requesting automatic failover from one SQL Server 2000 to another. The servers also run an application server, whose clients will detect the loss of the primary server and access the secondary server instead.

    The requirements are: the databases on both servers should be maintained the same, with a latency of no more than 5 minutes. The secondary must be immediately available for updates after the primary fails. The secondary is only used in case of failure. This is a manufacturing environment: the servers run 24-7, the network is closed and there are no IT personnel on-site.

    The words "Database mirroring" are echoing in my head repeatedly. I have been told that upgrading both servers is not in the cards.

    I've been digging through the details of replication. I can't even figure out if I want merge or transactional replication. Log shipping was already ruled out--it requires that the secondary be manually brought online. But the documentation keeps mentioning log shipping used with replication as a failover solution.

    Could someone please point me in the right direction? Once I know what will and won't work, I can work out the details of how to do it. If upgrading to 2005 is the only solution, I need ammunition to begin that fight.

  • Replication provides a warm standby with no automatic failover, though I seem to remember a product from CA that might provide automatic failover.  Clustering and Database Mirroring are the only Microsoft provided solutions that have automatic failover with Database Mirroring only available in SQL Server 2005.  I've never heard of a log shipping solution that provides automatic failover.

    Greg

    Greg

  • I still need to provide a solution. I'm unable to upgrade the server version or purchase any products. Technically, the failover is accomplished by the application server. Maybe I do need just a warm failover. The database is simple: one rarely modified reference table and two tables that receive constant streams of inserts and daily scheduled deletes.

    I think that transactional replication will work, but I'm not sure which type to use. I'm digging through the documentation looking for what happens when a publisher or subscriber is offline and for what happens when a publisher or subscriber must be restored. That information is buried in BOL, with tantalizing bits and pieces scattered about.

    The biggest issue appears to be downtime. In order to quiesce the system, a manufacturing line must be stopped.

  • Go with replication. Transactional or Merge will be fine. Create either a push subscription from the main server to your backup server or a pull subscription the other way. Set the schedule to every 5 minutes. Allow the app server to switch clients over in the case of failure. Set up some sort of email warning if either of the servers fail. Make sure both servers are on separate power sources.

    What happens if the app server fails?

  • Arethe servers connected to a SAN or NAS ? If yes clustering will be the best solution. How are the clients switching from one server to the other ? In my opinion the customer is requiering an high avability system. This can be effectively provided through clustering or db mirroring only.

  • I would love to be able to implement a better solution. I wish I could say this project was in design stage. However, it's in PRODUCTION already and the customer has paid for high availability but most of the budget is gone. I was not consulted on this until a few days ago. I think the engineers were planning on just writing a few triggers and hoping that would do it.

    There are two servers and each one is a combined app/database server. The app server has its own method of providing failover. The app clients are aware of both servers and will switch when one server can't be reached.

    I've decided to use transactional replication because the potential for conflicts is almost non-existent. Maybe peer-to-peer? There is no separate distribution server. If each server has its own distribution database, then if the primary goes down, the secondary can operate and no recovery is needed when the primary comes back up (the assumption here is that the primary only went down due to someone unplugging a needed cable, which is surprisingly likely).

    Does that sound reasonable?

  • Replication does not provide automatic provide automatic failover. Look into Mirroring. Theres synchronous vs asynchronous modes. Depending on your application/requirements you can go with either. Synchronous provides automatic failover.

    Also peer-peer is only available in SQL 2005. I believe you mentioned you are using SQL 2000 in your first post. In replication, the publisher pushes the data (or the subscriber pulls the data from publisher depending on how you set it up) but the delay could be anywhere between a few seconds to few minutes. If your primary DB goes down during peak hours, the distribution db can have several hundred/thousand transactions sitting in queue waiting to be applied at the subscriber. Until they are all applied your subscriber db is not ready to be available online. Also, understand your subscriber db should not have any foreign keys, tables withour Primary Keys cannot be replicated. So your subscriber db is not exactly full size production db. Replication is useful in a differenr scenario where you can use the subscriber db for reporting, querying etc. not as a warm standby.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • Thanks, Dinakar, for shooting holes in my plans. No, really, thanks. I needed to know that before I went any further in the wrong direction.

    My problem now is not technical but organizational. Doesn't it always end up that way? The customer was promised this feature. The system is in production already. No one bothered to design the database, just threw in some tables with no keys, no indexes and every column nullable, except the gratuitous identity column. Grrr... I've just now been given the requirements and I have to provide a script to be applied on the database that implements the high availability.

    I have to go rain on someone's parade now and start the move to 2005...

    Thanks for all the help.

  • I would be curious to know how this ends up.  Please keep us informed!

    -- Cory

  • Me too.  Good luck with this, Stephanie.  It sounds like you're suffering for someone else's screwup.

    Greg

    Greg

  • I understand. Its better you not get into the mess than get into it and take the blame officially.

    Peer-Peer replication in SQL 2005 sounds promising. but again, you can only replicate tables with Primary Keys so if you have tables without PKs replication cannot help you.

    So, it depends on how much flexibility you have in fixing the mess.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • It sounds like the design issues with the database are so serious that the ususal source of downtime will be bugs in the application, not hardware, so automatic failover would be of little benefit.

    More than likely, the application server code is a mess too, so even if you had database mirroring in place, it wouldn't work.

     

     

  • What a mess...Again, if you move to sql 2005, try to push mirroring. In my opinion more safe way you can go with such a db.

    Good luck

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply