[infinispan-dev] X-datacentre replication: design suggestions

Mircea Markus mmarkus at redhat.com
Fri Feb 10 14:18:31 EST 2012


> #1 Virtual cluster
> 
> Rehashing:
> 
> I believe we'll have to come up with a custom consistent hash that
> knows
> about the 2 sites, and places all of the data owners into the local
> site. E.g. it would be bad if we create a key K in LON and make the
> primary owner T (in SFO) and the backup owner B in LON !

I think this approach works very well if you use SFO as an hot-standby for LON: the purpose of SFO bein is to pick up the load in the case LON goes down for whatever reason.  
In an follow the sun approach, in which LON is up for 12h and then SFO takes over the next 12h there's this won't work.  

> This should also minimize rehashing across sites, and prefer
> rehashing
> within sites.

Not sure about that: AFAIK the way rehashing is implemented, is always the primary owner that generates the state. So if the joiner is in SFO then the burden of join would be much higher.  

> 
> In terms of data access, the idea is that all writes would only go
> through 1 site, e.g. LON being active and SFO being a backup (in
> which
> reads could happen); my design didn't assume concurrent writes in
> multiple sites (at least not to the same data).

Yes indeed this would work for the master-slave (or hot standby scenario). IMO we should also consider the master-master (follow the sun) approach as well.   
 
> Locking:
> 
> Same as above
> 
> Configuration:
> 
> numOwners would have to be extended, e.g. we could introduce a
> property
> numPrimarySiteOwners=2 and numBackupSiteOwners=1, TDB
yes indeed these attributes would be specific to the new CH function you added.
> 
> 
> #2 Hot Rod based
> 
> Having to connect to potentially all nodes is bad, as customers
> possibly
> only want to open 1 port for cross-datacenter traffic.
+1.
> 
> I assume the RemoteCacheStore would have to be enabled in all nodes
> in a
> given site for this to work ?

yes. that's a cons for this approach.

> 
> How would you handle requests sent during the crash of a HotRod
> endpoint
> ? Would they get queued, similar to
> https://issues.jboss.org/browse/JGRP-1401 ?

that would need to be re-implemented indeed. It's the first in the list of cons for this approach. 

> 
> How would initial state transfer be done between sites ? E.g. LON has
> been up for 2 days and now we start SFO. Does that mean we will
> effectively have to transfer *all* of the data in LON --> SFO ?

Transfering all the data is required as the data must be mirrored between the sites.
Of course we'll onloy transfer every entry once, and not numOwners times.

> 
> 
> #3 Custom bridge
> 
> I like the fact that both sites are configured with possibly
> different
> numOwners, e.g. LON=2 and SFO=1
> 
> This will not work if you need to invoke blocking RPCs between sites:
> the copying of traffic to A is always assumed to be asynchronous.
Why that? async replication is intended as a default, but sync replication should be possible as well. Just that the bridge would make the invocation in a synchronious way.

> Plus,
> the recipient in SFO wouldn't know the original sender; in the
> example,
> the sender would always be X.

Not sure why this is bad for a functional perspective. As long as the sites end up being in sync. 

> 
> How do you handle the case where the relay (A or X) crashes and
> messages
> are sent during that time, before a new relay is elected ?
A subset of RELAY can be used for this, together with JGRP-1401. 

> How do you do state transfer, e.g. bootstrap SFO from LON ? I guess
> you
> transfer the entire state from LON --> SFO, right ?

Indeed state transfer seems to be the biggest challange with this approach. I have some ideas, I'd also like to get Dan's input on this.

> SUMMARY:
> 
> I would definitely *not* do #2.
+1

> I do like #3 and #1, perhaps we need to focus a bit on #3, to see if
> there are other deficiencies we haven't seen yet, as I'm already
> familiar with #1.

thanks bela!


More information about the infinispan-dev mailing list