Hi Mircea
I think you're missing an intro section with the use cases we want to
handle in 5.2.
The requirements for having a backup datacenter that isn't accessed
by
any clients are pretty different from the requirements for multiple
"primary" datacenters that are all handling requests at the same
time,
so we should have a clear image of what we are trying to achieve.
+1. That becabe very clear after reading Bela's email :-)
On Fri, Feb 10, 2012 at 5:47 PM, Bela Ban <bban(a)redhat.com> wrote:
> I'm going to comment via this mailing list, and we can later
> summarize
> and append to the wiki (I don't like discussions on the wiki...
> :-))...
>
> #1 Virtual cluster
>
> Rehashing:
>
> I believe we'll have to come up with a custom consistent hash that
> knows
> about the 2 sites, and places all of the data owners into the local
> site. E.g. it would be bad if we create a key K in LON and make the
> primary owner T (in SFO) and the backup owner B in LON !
> This should also minimize rehashing across sites, and prefer
> rehashing
> within sites.
>
For some use cases I would think it's essential that the data is
replicated in both sites (async, of course).
But I second the idea that there must be at least one backup in the
same site as the primary owner, or rehashes become really expensive.
+1. Or rehash
can be inproved to send statet only if the joiner is colocated (vs send state if node is
primary owner).
> In terms of data access, the idea is that all writes would only go
> through 1 site, e.g. LON being active and SFO being a backup (in
> which
> reads could happen); my design didn't assume concurrent writes in
> multiple sites (at least not to the same data).
+1. I'll make that clearer
in the doc as well.
>
I like the idea of designating one site as the "master", but I'm
pretty sure we want to handle use cases where there is more than one
master.
+1
Perhaps we could incorporate Erik's suggestion on the wiki and
allow
the selection of a master site both at a cache level and at a key
level.
+1
Thanks Dan!