Manik Surtani wrote:
On 20 Jul 2009, at 14:51, Bela Ban wrote:
> General comments:
> - It seems that this is a mechanism where you replicate to the
> repl_count instances 'next' to you, ie. similar to Buddy Replication ?
Yes. So if you consider the hash space on a clockwise-incrementing
wheel [1], nodes {A, B, C, D} each have fixed positions on this wheel
(perhaps by making use of the hash codes on their Addresses). So this
becomes deterministic. Any key (K1 ~ K4 in the diagram in [1]) can be
located by ascertaining their place on the hash wheel, moving
clockwise, and considering the first "repl_count" nodes encountered.
OK
> - Are you tying your rebalancing mechanism to the consistent hash
> implementation ? This would IMO be bad because it doesn't allow for
> plugging in of a different CH algorithm !
Unfortunately the steps I have outlined below does imply some
knowledge of the hash algorithm, to determine which nodes are affected
by a LEAVE event, to minimise rebalancing. Perhaps this can be
provided by the ConsistentHash interface, so that any new
implementation would need to provide a list of affected nodes when a
give node leaves.
If this is the case, then I'd make sure that both the CH implementation
*and* the rebalancing policy are pluggable, so you could write an
instance of both and use it.
For example, I can think of CH implementations that have 'geographical'
knowledge, e.g. hosts on which nodes are running and try to store keys
on nodes which are as 'far' apart from each other as possible, similar
to what you did with Buddy Replication where you make sure the buddies
are on different hosts, or not on blades within the same rack.
> What does 'installing a consisten hash instance' mean ?
Literally, it means swapping out the CH instance used on a given node
with an alternate one. A CH instance encapsulates a "node list" and
the logic to locate a key within the node list, using hash functions
to determine which nodes would hold a given key.
Ah, so I assume you're sending a new node list, but don't swap out the
CH *logic* which is always the same, right ?
See this interface here [2], and a default (if currently imperfect)
implementation [3]. CHs are immutable so when a new view is
available, a new CH instance with a new node list should be created.
In addition, aggregate CHs are allowed, to consider the union of two
different views. This is represented by a delegating CH impl, and is
used in step 5.2 when a JoinTask is kicked off on a new joiner.
>> 4.5. GetConsistentHashCommand - an RPC command that "asks" a
>> node to serialize and transmit across its current CH impl.
>
> Serialize what ? The state (I assume) ? Or the consistent hash ? How
> can you serialize a CH in the latter case ?
This refers to the CH impl. The CH impl that needs to be serialized
would typically be an instance of [3]. Since the CH impl just
contains a List of Addresses, this should be possible. This is
necessary so that the joiner knows the state of the cluster before it
joined, and is able to ask specific members for state.
Another option to this is to assume that the previous state is the
same as the new state, minus the new joiner. Saves on RPC calls.
OK, so why don't you simply use the View ? Or do you actually, on a view
change, call setCaches() with the list of addresses you got from the View ?
--
Bela Ban
Lead JGroups / Clustering Team
JBoss