On 20 Jul 2009, at 16:01, Bela Ban wrote:
Manik Surtani wrote:
>
> On 20 Jul 2009, at 14:51, Bela Ban wrote:
>
>> General comments:
>> - It seems that this is a mechanism where you replicate to the
>> repl_count instances 'next' to you, ie. similar to Buddy
>> Replication ?
>
> Yes. So if you consider the hash space on a clockwise-incrementing
> wheel [1], nodes {A, B, C, D} each have fixed positions on this
> wheel (perhaps by making use of the hash codes on their
> Addresses). So this becomes deterministic. Any key (K1 ~ K4 in
> the diagram in [1]) can be located by ascertaining their place on
> the hash wheel, moving clockwise, and considering the first
> "repl_count" nodes encountered.
OK
>
>> - Are you tying your rebalancing mechanism to the consistent hash
>> implementation ? This would IMO be bad because it doesn't allow
>> for plugging in of a different CH algorithm !
>
> Unfortunately the steps I have outlined below does imply some
> knowledge of the hash algorithm, to determine which nodes are
> affected by a LEAVE event, to minimise rebalancing. Perhaps this
> can be provided by the ConsistentHash interface, so that any new
> implementation would need to provide a list of affected nodes when
> a give node leaves.
If this is the case, then I'd make sure that both the CH
implementation *and* the rebalancing policy are pluggable, so you
could write an instance of both and use it.
For example, I can think of CH implementations that have
'geographical' knowledge, e.g. hosts on which nodes are running and
try to store keys on nodes which are as 'far' apart from each other
as possible, similar to what you did with Buddy Replication where
you make sure the buddies are on different hosts, or not on blades
within the same rack.
Yes, good point.
>> What does 'installing a consisten hash instance' mean
?
>
> Literally, it means swapping out the CH instance used on a given
> node with an alternate one. A CH instance encapsulates a "node
> list" and the logic to locate a key within the node list, using
> hash functions to determine which nodes would hold a given key.
Ah, so I assume you're sending a new node list, but don't swap out
the CH *logic* which is always the same, right ?
Yes. The DefaultCH would just serialize the member list. Other CH
impls may contain additional data.
See this interface here [2], and a default (if currently imperfect)
implementation [3]. CHs are immutable so when a new view is
available, a new CH instance with a new node list should be created.
>
> In addition, aggregate CHs are allowed, to consider the union of
> two different views. This is represented by a delegating CH impl,
> and is used in step 5.2 when a JoinTask is kicked off on a new
> joiner.
>
>>> 4.5. GetConsistentHashCommand - an RPC command that "asks" a
>>> node to serialize and transmit across its current CH impl.
>>
>> Serialize what ? The state (I assume) ? Or the consistent hash ?
>> How can you serialize a CH in the latter case ?
>
> This refers to the CH impl. The CH impl that needs to be
> serialized would typically be an instance of [3]. Since the CH
> impl just contains a List of Addresses, this should be possible.
> This is necessary so that the joiner knows the state of the cluster
> before it joined, and is able to ask specific members for state.
>
> Another option to this is to assume that the previous state is the
> same as the new state, minus the new joiner. Saves on RPC calls.
OK, so why don't you simply use the View ? Or do you actually, on a
view change, call setCaches() with the list of addresses you got
from the View ?
I can do that on the existing nodes. On the new node (the joiner) I
need to get a hold of the previous view before the joiner came up.
But like I said above, perhaps this can be deduced.
Cheers
--
Manik Surtani
manik(a)jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org