[infinispan-dev] X-datacentre replication: design suggestions

Fri Feb 10 14:17:27 EST 2012

Hi Mircea

I think you're missing an intro section with the use cases we want to
handle in 5.2.

The requirements for having a backup datacenter that isn't accessed by
any clients are pretty different from the requirements for multiple
"primary" datacenters that are all handling requests at the same time,
so we should have a clear image of what we are trying to achieve.

On Fri, Feb 10, 2012 at 5:47 PM, Bela Ban <bban at redhat.com> wrote:
> I'm going to comment via this mailing list, and we can later summarize
> and append to the wiki (I don't like discussions on the wiki... :-))...
>
> #1 Virtual cluster
>
> Rehashing:
>
> I believe we'll have to come up with a custom consistent hash that knows
> about the 2 sites, and places all of the data owners into the local
> site. E.g. it would be bad if we create a key K in LON and make the
> primary owner T (in SFO) and the backup owner B in LON !
> This should also minimize rehashing across sites, and prefer rehashing
> within sites.
>

For some use cases I would think it's essential that the data is
replicated in both sites (async, of course).
But I second the idea that there must be at least one backup in the
same site as the primary owner, or rehashes become really expensive.

> In terms of data access, the idea is that all writes would only go
> through 1 site, e.g. LON being active and SFO being a backup (in which
> reads could happen); my design didn't assume concurrent writes in
> multiple sites (at least not to the same data).
>

I like the idea of designating one site as the "master", but I'm
pretty sure we want to handle use cases where there is more than one
master.
Perhaps we could incorporate Erik's suggestion on the wiki and allow
the selection of a master site both at a cache level and at a key
level.

> Locking:
>
> Same as above
>

Having only one master site would certainly make locking easier.
In the other scenarios I'm not sure if we should always designate a
master site for a key and always lock there, or if we should only lock
on a "local primary owner" that's in the same site as the transaction
originator.

> Configuration:
>
> numOwners would have to be extended, e.g. we could introduce a property
> numPrimarySiteOwners=2 and numBackupSiteOwners=1, TDB
>

Should we allow the user to specify the number of backup sites as
well? That's assuming we are going to support more than two sites...

>
> #2 Hot Rod based
>
> Having to connect to potentially all nodes is bad, as customers possibly
> only want to open 1 port for cross-datacenter traffic.
>

I'm not sure I agree. It would be bad if we expect to connect over the
internet to the remote site, but I assume that everyone would use a
VPN to communicate between sites anyway.

> I assume the RemoteCacheStore would have to be enabled in all nodes in a
> given site for this to work ?
>
> How would you handle requests sent during the crash of a HotRod endpoint
> ? Would they get queued, similar to
> https://issues.jboss.org/browse/JGRP-1401 ?
>
> How would initial state transfer be done between sites ? E.g. LON has
> been up for 2 days and now we start SFO. Does that mean we will
> effectively have to transfer *all* of the data in LON --> SFO ?
>
>
> #3 Custom bridge
>
> I like the fact that both sites are configured with possibly different
> numOwners, e.g. LON=2 and SFO=1
>

+1

> This will not work if you need to invoke blocking RPCs between sites:
> the copying of traffic to A is always assumed to be asynchronous. Plus,
> the recipient in SFO wouldn't know the original sender; in the example,
> the sender would always be X.
>

Mircea was saying that we'd only send over committed modifications, so
no locking RPCs that need to be synchronous.
I'm not sure if the original sender is still relevant in this case.

> How do you handle the case where the relay (A or X) crashes and messages
> are sent during that time, before a new relay is elected ?
>

I suspect this one is the tricky bit... Mircea did mention that he'd
like to reuse the retransmission logic from RELAY, but I'm not sure
how easy that would be.

> How do you do state transfer, e.g. bootstrap SFO from LON ? I guess you
> transfer the entire state from LON --> SFO, right ?
>

For 5.2 I think this would be acceptable.

>
>
>
> SUMMARY:
>
> I would definitely *not* do #2.
> I do like #3 and #1, perhaps we need to focus a bit on #3, to see if
> there are other deficiencies we haven't seen yet, as I'm already
> familiar with #1.
>
> Cheers,
>
>
> On 2/10/12 4:02 PM, Mircea Markus wrote:
>> Hi,
>>
>> I've starte a document[1] that contains 3 possible approaches for implementing the X-datacentre replication functionality.
>> This is a highly requested bit of functionality for 5.2 and involves interaction between several components: e.g. state transfer, transactions hotrod, jgroups etc. Can you please take a look and comment?
>>
>> [1] https://community.jboss.org/wiki/CrossDatacenterReplication-Design
>
>
> --
> Bela Ban
> Lead JGroups (http://www.jgroups.org)
> JBoss / Red Hat
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev