architectural advice

  • We are an online retailer looking to expand our business to international market. Currently we have one single database that takes orders and process them. We are looking to make some architectural changes to introduce redundancy as well as to allow parallel order processing at different data centers. Not exactly sure where our second data center will be, but its either other area of the country or maybe in Europe. Does anyone have experience setting up a similar environment? The technologies that come to my mind are active-active clustering (which is pricy, sort of, for enterprise edition, and hardwares, etc, we are on standard edition now), merge replication (is it tricky to maintain), federated servers (not exactly sure how this works), database mirroring? The tricky is how do you maintain identity OrderID if two servers are taking orders at the same time?

    Sorry if this sounds confusing. Let me know if you have any questions. Thanks much for your advice.

  • KATHLEEN Y ZHANG (3/28/2008)


    We are an online retailer looking to expand our business to international market. Currently we have one single database that takes orders and process them. We are looking to make some architectural changes to introduce redundancy as well as to allow parallel order processing at different data centers. Not exactly sure where our second data center will be, but its either other area of the country or maybe in Europe. Does anyone have experience setting up a similar environment? The technologies that come to my mind are active-active clustering (which is pricy, sort of, for enterprise edition, and hardwares, etc, we are on standard edition now), merge replication (is it tricky to maintain), federated servers (not exactly sure how this works), database mirroring? The tricky is how do you maintain identity OrderID if two servers are taking orders at the same time?

    Sorry if this sounds confusing. Let me know if you have any questions. Thanks much for your advice.

    Hi Kathleen,

    This sounds like a replication scenario to me. First of all, an active-active cluster won't give you the parallel order processing. Or rather, you can process orders in parallel but you'll still need to worry about merging the data somehow. Secondly, a geographically dispersed cluster is even more stringent on the the hardware requirements, and as you already mentioned, the cost can be prohibitive.

    Mirroring is just a redundancy solution (although you can extend that to support some reporting services) so you can't achieve parallel processing in this respect.

    Replication was designed for this kind of thing. Particularly if you decide to expand in two/three years. There are different ways in which you can design the replication topology, which depends on the needs of the business. Given that it's only two sites involved it seems relatively simple, but you will need to carefully thing about the design and implementation.

    For example, will the second site need to have access to orders processed by the first site? From a business point of view the answer might be "no" but from a redundancy point of view it is a "yes". So it looks like this would be a publisher/subscriber to publisher/subscriber topology.

    As you already noted, you need to think about things like identities. To work around this you would set up a range at your main site that allows identies from 1 to 1,000,000 (for example). And your second site identities could range from 1,000,001 to 2,000,000. Or another neat trick, which people forget is that you can use negative integers for identies. So at 1 site the identity values are positive, with an increasing increment. At the second site, you could have negative numbers with a decreasing increment. This way the identities will never overlap.

    It is true that replication will require the most amount of continuous support on the DBA team. So this needs to be a consideration.

    Now, if you don't need to merge the data (i.e. orders processed at site 2 don't need to be merged with orders at site 1) then I would consider having two databases at each site, and each one of them mirrors to the other site. This will allow parallel processing, although the data at each site is somewhat independant, and in case of failure, one site will manage the load of both databases.

    The only thing to be concerned about this is that you only have standard edition SQL Server, which only supports synchronous mirroring, meaning that transactions won't commit on a database until it has been committed at the second site. If the link between the sites is poor this could affect performance.

    Anyway, hope this helps and kicks off a few other replies.

  • Karl has a pretty good summary of what to worry about and what advice I'd give.

  • Thanks Karl and Steve for the input!

    If I would use merge replication, how tricky it is to set up and maintain and how robust it is? Say, e.g, if replication breaks, is it painful to do snapshot? Because I am using transactional replication right now, when doing snapshots, the tables are locked, luckily, the publisher is for internal use only, so I can schedule snapshot to happen after people leave for the day.

    Other than using different number ranges order number generating, I am thinking maybe use something like 'US100', 'US101',... and 'EU100', 'EU101'..., then I have to use a process to assign order numbers. Not sure if you have any thoughts on this. Karl, I like your idea of using negative numbers. If a customer calls in and asks about his order, what would he say if CSR asks for his order number... No offense, just kidding. 🙂

    Thank you so much!

  • KATHLEEN Y ZHANG (3/28/2008)


    Thanks Karl and Steve for the input!

    If I would use merge replication, how tricky it is to set up and maintain and how robust it is? Say, e.g, if replication breaks, is it painful to do snapshot? Because I am using transactional replication right now, when doing snapshots, the tables are locked, luckily, the publisher is for internal use only, so I can schedule snapshot to happen after people leave for the day.

    Other than using different number ranges order number generating, I am thinking maybe use something like 'US100', 'US101',... and 'EU100', 'EU101'..., then I have to use a process to assign order numbers. Not sure if you have any thoughts on this. Karl, I like your idea of using negative numbers. If a customer calls in and asks about his order, what would he say if CSR asks for his order number... No offense, just kidding. 🙂

    Thank you so much!

    Merge replication is a little more tricky than the other types of replication and you have to take into account a few things: Here's a good overview of the thing you need to look out for.... http://msdn2.microsoft.com/en-us/library/aa237486.aspx

    In your situation, without having all the details, I'd say it sounds like it would be reasonably straight forward. You only have two sites, and you don't need to worry about conflict resolution because each sites handles its own orders. So that reduces the complexity a little.

    Unfortunately, the snapshot has to be done. Obviously it's best to take the snapshot whilst offline or during off-peak hours but you're going to have to do it at some stage. You could create a few seperate publications and snapshot them at different times so that you spread the impact. For example, you could put all the small tables in one publication (this will snapshot quickly). Then split the regularly used tables across two publications.

    Snapshot is the most painful part - so you'll need to test this aspect on a database that is representative of your production environment. However, once it's up and running it's pretty robust. The key thing is to keep on top of replication 24*7. If you let tiny problems go unattended it can quickly spiral out of control. I've seen replication that stopped for 8 hours overnight and by the time the problem was resolved it couldn't replicate the backlog quick enough and deal with current demand. That was an extreme example but if it had been dealt with immediately it wouldn't have resulted in as big a problem.

    Note trying to scare you by the way. Just warning you of the pitfalls. 🙂

    Like I said though, once it's been setup things should run pretty smoothly I suspect - particularly in this situation which is a relatively straight forward setup.

  • Here's a decent read on scaling out your application:

    http://msdn2.microsoft.com/en-us/library/aa479364.aspx

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks Karl and John. I will do some reading and see what I find out.

  • Great tips karl. I'm currently administering a pull replication. To load the Snapshot it takes up to 30 hours. I am thinking of divide the publication following your advice.

  • I would do this.

    If you are going to loss a million dollars a day Then

    Billing System and Catalog

    Cluster in Europe, Active / Passive

    Or

    Mirroring with client redirect (ADO 2.0)

    Billing System and Catalog

    Cluster in USA, Active / Passive

    Or

    Mirroring with client redirect (ADO 2.0)

    Then the Nexus in the USA

    Cluster in USA, Active / Passive

    Nexus is a cool name to replicate data from USA and Europe Billing system and Catalog to a center database. And if you update the catalog from the Nexus it will push it out to the db servers. Transaction replication with Queue updates. And you can use a guid datatype as Primary key or as stated above partition the integer Primary key into ranges. I have not used the peer to peer reblication but I read that might be a design pattern.

    I found the data model needs to be designed from the beginning.

  • Thanks for the reply!

    This is what I found about updatable subscriptions:

    To make schema changes on a table in a publication that supports updating subscriptions, all activity on the table must be stopped at the Publisher and Subscribers, and pending data changes must be propagated to all nodes before making any schema changes. This ensures that outstanding transactions do not conflict with the pending schema change. After the schema changes have propagated to all nodes, activity can resume on the published tables.

    We are making schema changes all the time. Stopping all activities on the published tables are not quite possible. Is there a way around?

  • Kathleen,

    I meant to mention this in one of my other posts. If you're making a lot of schema changes then that could be a problem as far as the replication scenario is concerned as replication typically works better in a stable environment.

    Surely though, when making schema changes you do this as a managed application update and therefore take down the application for maintenance?

  • It has been a while. I am still working on finding the perfect solution to our new architecture.

    I am looking into peer to peer replication as a possible solution. One consideration is data stewardship, i.e. only the owner of the data is allowed to update it. Supposely, I can partition the orders so that each peer in the topology gets a range of order, say by location. For example, only orders from the U.S. is inserted, updated on server1. My question is, once an order is placed on Server1, if mutiple CSRs are updating the same entry at the same time on the same server (not thru other peers in the topology), is this considered conflict resolution that peer to peer replication does not address (only merge replication can take care of), or it can be resolved by locks?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply