Re: [infinispan-dev] X-datacentre replication: design suggestions

Friday, 10 February 2012

Hi Mircea

I think you're missing an intro section with the use cases we want to
handle in 5.2.

The requirements for having a backup datacenter that isn't accessed by
any clients are pretty different from the requirements for multiple
"primary" datacenters that are all handling requests at the same time,
so we should have a clear image of what we are trying to achieve.

On Fri, Feb 10, 2012 at 5:47 PM, Bela Ban <bban(a)redhat.com&gt; wrote:
...
 I'm going to comment via this mailing list, and we can later
summarize
 and append to the wiki (I don't like discussions on the wiki... :-))...

 #1 Virtual cluster

 Rehashing:

 I believe we'll have to come up with a custom consistent hash that knows
 about the 2 sites, and places all of the data owners into the local
 site. E.g. it would be bad if we create a key K in LON and make the
 primary owner T (in SFO) and the backup owner B in LON !
 This should also minimize rehashing across sites, and prefer rehashing
 within sites.

For some use cases I would think it's essential that the data is
replicated in both sites (async, of course).
But I second the idea that there must be at least one backup in the
same site as the primary owner, or rehashes become really expensive.

...
 In terms of data access, the idea is that all writes would only go
 through 1 site, e.g. LON being active and SFO being a backup (in which
 reads could happen); my design didn't assume concurrent writes in
 multiple sites (at least not to the same data).

I like the idea of designating one site as the "master", but I'm
pretty sure we want to handle use cases where there is more than one
master.
Perhaps we could incorporate Erik's suggestion on the wiki and allow
the selection of a master site both at a cache level and at a key
level.

...
 Locking:

 Same as above

Having only one master site would certainly make locking easier.
In the other scenarios I'm not sure if we should always designate a
master site for a key and always lock there, or if we should only lock
on a "local primary owner" that's in the same site as the transaction
originator.

...
 Configuration:

 numOwners would have to be extended, e.g. we could introduce a property
 numPrimarySiteOwners=2 and numBackupSiteOwners=1, TDB

Should we allow the user to specify the number of backup sites as
well? That's assuming we are going to support more than two sites...

...

 #2 Hot Rod based

 Having to connect to potentially all nodes is bad, as customers possibly
 only want to open 1 port for cross-datacenter traffic.

I'm not sure I agree. It would be bad if we expect to connect over the
internet to the remote site, but I assume that everyone would use a
VPN to communicate between sites anyway.

...
 I assume the RemoteCacheStore would have to be enabled in all nodes
in a
 given site for this to work ?

 How would you handle requests sent during the crash of a HotRod endpoint
 ? Would they get queued, similar to
 https://issues.jboss.org/browse/JGRP-1401 ?

 How would initial state transfer be done between sites ? E.g. LON has
 been up for 2 days and now we start SFO. Does that mean we will
 effectively have to transfer *all* of the data in LON --> SFO ?

 #3 Custom bridge

 I like the fact that both sites are configured with possibly different
 numOwners, e.g. LON=2 and SFO=1

+1

...
 This will not work if you need to invoke blocking RPCs between
sites:
 the copying of traffic to A is always assumed to be asynchronous. Plus,
 the recipient in SFO wouldn't know the original sender; in the example,
 the sender would always be X.

Mircea was saying that we'd only send over committed modifications, so
no locking RPCs that need to be synchronous.
I'm not sure if the original sender is still relevant in this case.

...
 How do you handle the case where the relay (A or X) crashes and
messages
 are sent during that time, before a new relay is elected ?

I suspect this one is the tricky bit... Mircea did mention that he'd
like to reuse the retransmission logic from RELAY, but I'm not sure
how easy that would be.

...
 How do you do state transfer, e.g. bootstrap SFO from LON ? I guess
you
 transfer the entire state from LON --> SFO, right ?

For 5.2 I think this would be acceptable.

...

 SUMMARY:

 I would definitely *not* do #2.
 I do like #3 and #1, perhaps we need to focus a bit on #3, to see if
 there are other deficiencies we haven't seen yet, as I'm already
 familiar with #1.

 Cheers,

 On 2/10/12 4:02 PM, Mircea Markus wrote:
> Hi,
>
> I've starte a document[1] that contains 3 possible approaches for implementing
the X-datacentre replication functionality.
> This is a highly requested bit of functionality for 5.2 and involves interaction
between several components: e.g. state transfer, transactions hotrod, jgroups etc. Can you
please take a look and comment?
>
> [1] https://community.jboss.org/wiki/CrossDatacenterReplication-Design

 --
 Bela Ban
 Lead JGroups (http://www.jgroups.org)
 JBoss / Red Hat
 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] X-datacentre replication: design suggestions