[infinispan-dev] X-datacentre replication: design suggestions

Fri Feb 10 10:47:07 EST 2012

I'm going to comment via this mailing list, and we can later summarize 
and append to the wiki (I don't like discussions on the wiki... :-))...

#1 Virtual cluster

Rehashing:

I believe we'll have to come up with a custom consistent hash that knows 
about the 2 sites, and places all of the data owners into the local 
site. E.g. it would be bad if we create a key K in LON and make the 
primary owner T (in SFO) and the backup owner B in LON !
This should also minimize rehashing across sites, and prefer rehashing 
within sites.

In terms of data access, the idea is that all writes would only go 
through 1 site, e.g. LON being active and SFO being a backup (in which 
reads could happen); my design didn't assume concurrent writes in 
multiple sites (at least not to the same data).

Locking:

Same as above

Configuration:

numOwners would have to be extended, e.g. we could introduce a property 
numPrimarySiteOwners=2 and numBackupSiteOwners=1, TDB

#2 Hot Rod based

Having to connect to potentially all nodes is bad, as customers possibly 
only want to open 1 port for cross-datacenter traffic.

I assume the RemoteCacheStore would have to be enabled in all nodes in a 
given site for this to work ?

How would you handle requests sent during the crash of a HotRod endpoint 
? Would they get queued, similar to 
https://issues.jboss.org/browse/JGRP-1401 ?

How would initial state transfer be done between sites ? E.g. LON has 
been up for 2 days and now we start SFO. Does that mean we will 
effectively have to transfer *all* of the data in LON --> SFO ?

#3 Custom bridge

I like the fact that both sites are configured with possibly different 
numOwners, e.g. LON=2 and SFO=1

This will not work if you need to invoke blocking RPCs between sites: 
the copying of traffic to A is always assumed to be asynchronous. Plus, 
the recipient in SFO wouldn't know the original sender; in the example, 
the sender would always be X.

How do you handle the case where the relay (A or X) crashes and messages 
are sent during that time, before a new relay is elected ?

How do you do state transfer, e.g. bootstrap SFO from LON ? I guess you 
transfer the entire state from LON --> SFO, right ?

SUMMARY:

I would definitely *not* do #2.
I do like #3 and #1, perhaps we need to focus a bit on #3, to see if 
there are other deficiencies we haven't seen yet, as I'm already 
familiar with #1.

Cheers,

On 2/10/12 4:02 PM, Mircea Markus wrote:
> Hi,
>
> I've starte a document[1] that contains 3 possible approaches for implementing the X-datacentre replication functionality.
> This is a highly requested bit of functionality for 5.2 and involves interaction between several components: e.g. state transfer, transactions hotrod, jgroups etc. Can you please take a look and comment?
>
> [1] https://community.jboss.org/wiki/CrossDatacenterReplication-Design

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat