[jboss-dev-forums] [Design of JBossCache] - Re: JBCACHE-816 vs JGRP-105

Wed Nov 7 07:11:35 EST 2007

A month ago, Bela, Vladimir, Jimmy and myself were discussing my suggestions proposed in this forum thread. Apologies if I haven't been able to post the notes earlier.

First of all, we discussed about the possible use cases with catastrophy recovery as the main use case. In such cases, a specific data centre might get hit by a tornado/earthquake...etc, and then users needs to be directed to another data centre so that they can carry on with their work. Data correctness between different data centres is not crucial. We had already some customers asking for such functionality. 

The second use case would be the creation of G2G structures which could be interesting for VoIP solutions.

Both use cases could be resolved at the GR level via JGRP-105 with the addition of filtering for inter group/datacentre communication a configurable communications layer for intra and inter group/datacentre comms. The main problem right now is that GR needs a lot of refactoring done. Vladimir is going to work on this, but chances are we wouldn't be able to get this done till JGroups 2.7/2.8.

The first use case (catastrophic failover) is more prioritary than the G2G use case, so we discussed a solution based around JBCACHE-816 so that we can catter for those customers, while JGRP-105 gets resolved. To sum up, here's the use case we want to resolve:

Use Case:
- 1 primary center and 1 secondary centre
- unidirecctional from primary to secondary at one moment in time with the assumption that all clients go to primary
- asynchronous communication between centres; it's not about 100% data reliability
- the aim is for business to be able to resume business on secondary in case of failover or upgrade.
- upon manual failover (upgrade of primary center) or catastrophic failover, manual intervention is required to switch a potential lb to a secondary centre, where clients will be directed to.

Solution:

Interceptor vs listener vs HA-Singleton:
- based on a configuration option, an interceptor (preferred) or cache listener could be created that encapsulates solution; this is preferred to a standalone solution based on HA-Singleton.
- interceptor allows for greater flexibility and simplicity in terms of relaying data between dcs (data centres), filtering, transactional work and state transfer propagation than cache listener solution.
- this interceptor is only active if it's the coordinator of the local datacentre; use similar technique to what Singleton Store Cache Loader uses.

Mux Channel:
- interceptor starts a mux channel, where coordinator of each data centre joins (inter data centre mux channel).
- if a split occurred, coordinators could attempt to join the inter dc cluster.
- mux channel would most likely be configured with TUNNEL so that it can talk to an intermediate GossipRouter; use TUNNEL definition from stacks.xml.

Relaying:
- in the context of this use case, relaying refers to inter dc communication.
- relaying could be done periodically, or per transaction/operation (put, remove, clear...etc).
- relaying would be asynchronous as data correctness is not paramount and it gives better performance.
- we're assumming that all clients go to one data centre and upon catastroyphy/upgrade, all clients are redirected to a different data centre manually, so when a dc (data centre) channel member intercepts a relevant local data centre replication event, it relays it. That means that any node in the inter dc cluster can relay information, but with the given assumption, simplifies the design.
- enabling any dc cluster member to relay simplifies solution for situations where inter dc link fails rather than nodes failing. In this case, clients are moved manually to a different dc, but relaying would continue to happen from any surviving dcs without any further necessary action.
- a dc cluster member reads what it gets via the mux channel, or primary data centre sends, converts it into an invocation and passes it up.
- inter dc relay ping pong effect should be avoided so that inter dc changes that are applied locally do not bounce back to other dcs. requires further thought on how to avoid it.

Transactions:
- inter dc communications is asynchronous, so 1pc always.
- intra dc will likely be synchronous but could be asynchronous.
- upon commit (if sync intra dc) or prepare (if async intra dc), take the list of modifications and relay them.
- on rollback, do nothing.

2nd in line:
- within a data centre, the second in line could maintain a list of modifications within the interceptor.
- if the second in line becomes master, it would join the inter dc cluster and could replay the modifications after receiving the state. That would guarantee that any possible modifications that were not delivered because the dc coordinator failed are applied.

State Transfer:
- if a new node starts and it's first in local data centre, inter data centre interceptor is active and joins the mux channel, potentially requesting a state transfer from the coordinator. The coordinator of the inter dc cluster does not necessarily have to be the primary relayer, but inter dc cluster members should have the same data.
- if a new node starts and it's not first in local data centre, standard local state transfer rules apply.
- streaming state transfer at inter data centre level would require further thought as potential intermediate firewalls would come into play.

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4102499#4102499

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4102499