[infinispan-dev] Distribution, take 2
Manik Surtani
manik at jboss.org
Mon Jul 20 10:38:36 EDT 2009
On 20 Jul 2009, at 14:51, Bela Ban wrote:
> General comments:
> - It seems that this is a mechanism where you replicate to the
> repl_count instances 'next' to you, ie. similar to Buddy Replication ?
Yes. So if you consider the hash space on a clockwise-incrementing
wheel [1], nodes {A, B, C, D} each have fixed positions on this wheel
(perhaps by making use of the hash codes on their Addresses). So this
becomes deterministic. Any key (K1 ~ K4 in the diagram in [1]) can be
located by ascertaining their place on the hash wheel, moving
clockwise, and considering the first "repl_count" nodes encountered.
So yes, this is "similar" to BR in that it uses a notion of fixed
number of "owner" nodes, but different in several other ways, such as:
* Each node is an "equal" owner
* Owner nodes are not determined by the order they appear in a cluster
view. Rather, this is based on their positions in the hash space.
> - Are you tying your rebalancing mechanism to the consistent hash
> implementation ? This would IMO be bad because it doesn't allow for
> plugging in of a different CH algorithm !
Unfortunately the steps I have outlined below does imply some
knowledge of the hash algorithm, to determine which nodes are affected
by a LEAVE event, to minimise rebalancing. Perhaps this can be
provided by the ConsistentHash interface, so that any new
implementation would need to provide a list of affected nodes when a
give node leaves.
> More comments inline
>
>
> Manik Surtani wrote:
>> DIST, take 2!
>>
>> Previous design failed due to two flaws:
>> 1. Transaction logs maintained on sender, for each recipient.
>> Plenty of scope for races, or heavily synchronized.
>> 2. Consistent hash attempted to be overly fair in evenly
>> dispersing nodes across a hash space. Meant that there was an
>> often large and unnecessary amount of rehashing to do, which
>> exacerbated the problem in 1.
>>
>> So we have a new approach, based on the following premises.
>>
>> 1. Consistent hash (CH) based on fixed positions in the hash space
>> rather than relative ones.
>
> Do you have a description of how the fixed-positions CH works ? From
> the bullets below it seems this is like Buddy Replication, where you
> store the data on the N buddies next to you.
See comments above about this.
> 1.1. Pros: limited and finite rehashing, particularly when
> there is a leave.
>> 1.1.1. If the leaver is L, only node (L - 1) and node (L +
>> 1) will have to push state, and only (L + 1) and (L + replCount)
>> will have to receive state.
>> 1.2. Cons: uneven spread of load (mitigated with grid size)
>
>> 4. Implementation notes:
>> 4.4. InstallConsistentHashCommand - an RPC command that
>> "installs" a consistent hash instance on remote nodes.
>
> What does 'installing a consisten hash instance' mean ?
Literally, it means swapping out the CH instance used on a given node
with an alternate one. A CH instance encapsulates a "node list" and
the logic to locate a key within the node list, using hash functions
to determine which nodes would hold a given key. See this interface
here [2], and a default (if currently imperfect) implementation [3].
CHs are immutable so when a new view is available, a new CH instance
with a new node list should be created.
In addition, aggregate CHs are allowed, to consider the union of two
different views. This is represented by a delegating CH impl, and is
used in step 5.2 when a JoinTask is kicked off on a new joiner.
>> 4.5. GetConsistentHashCommand - an RPC command that "asks" a
>> node to serialize and transmit across its current CH impl.
>
> Serialize what ? The state (I assume) ? Or the consistent hash ? How
> can you serialize a CH in the latter case ?
This refers to the CH impl. The CH impl that needs to be serialized
would typically be an instance of [3]. Since the CH impl just
contains a List of Addresses, this should be possible. This is
necessary so that the joiner knows the state of the cluster before it
joined, and is able to ask specific members for state.
Another option to this is to assume that the previous state is the
same as the new state, minus the new joiner. Saves on RPC calls.
Cheers
Manik
[1] http://tinyurl.com/n9lycn
[2] http://fisheye.jboss.org/browse/Infinispan/trunk/core/src/main/java/org/infinispan/distribution/ConsistentHash.java?r=144
[3] http://fisheye.jboss.org/browse/Infinispan/trunk/core/src/main/java/org/infinispan/distribution/DefaultConsistentHash.java?r=207
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
More information about the infinispan-dev
mailing list