[infinispan-dev] Distribution, take 2
Manik Surtani
manik at jboss.org
Mon Jul 20 11:32:19 EDT 2009
On 20 Jul 2009, at 16:01, Bela Ban wrote:
>
>
> Manik Surtani wrote:
>>
>> On 20 Jul 2009, at 14:51, Bela Ban wrote:
>>
>>> General comments:
>>> - It seems that this is a mechanism where you replicate to the
>>> repl_count instances 'next' to you, ie. similar to Buddy
>>> Replication ?
>>
>> Yes. So if you consider the hash space on a clockwise-incrementing
>> wheel [1], nodes {A, B, C, D} each have fixed positions on this
>> wheel (perhaps by making use of the hash codes on their
>> Addresses). So this becomes deterministic. Any key (K1 ~ K4 in
>> the diagram in [1]) can be located by ascertaining their place on
>> the hash wheel, moving clockwise, and considering the first
>> "repl_count" nodes encountered.
>
> OK
>
>
>>
>>> - Are you tying your rebalancing mechanism to the consistent hash
>>> implementation ? This would IMO be bad because it doesn't allow
>>> for plugging in of a different CH algorithm !
>>
>> Unfortunately the steps I have outlined below does imply some
>> knowledge of the hash algorithm, to determine which nodes are
>> affected by a LEAVE event, to minimise rebalancing. Perhaps this
>> can be provided by the ConsistentHash interface, so that any new
>> implementation would need to provide a list of affected nodes when
>> a give node leaves.
>
> If this is the case, then I'd make sure that both the CH
> implementation *and* the rebalancing policy are pluggable, so you
> could write an instance of both and use it.
>
> For example, I can think of CH implementations that have
> 'geographical' knowledge, e.g. hosts on which nodes are running and
> try to store keys on nodes which are as 'far' apart from each other
> as possible, similar to what you did with Buddy Replication where
> you make sure the buddies are on different hosts, or not on blades
> within the same rack.
Yes, good point.
>>> What does 'installing a consisten hash instance' mean ?
>>
>> Literally, it means swapping out the CH instance used on a given
>> node with an alternate one. A CH instance encapsulates a "node
>> list" and the logic to locate a key within the node list, using
>> hash functions to determine which nodes would hold a given key.
>
> Ah, so I assume you're sending a new node list, but don't swap out
> the CH *logic* which is always the same, right ?
Yes. The DefaultCH would just serialize the member list. Other CH
impls may contain additional data.
> See this interface here [2], and a default (if currently imperfect)
> implementation [3]. CHs are immutable so when a new view is
> available, a new CH instance with a new node list should be created.
>>
>> In addition, aggregate CHs are allowed, to consider the union of
>> two different views. This is represented by a delegating CH impl,
>> and is used in step 5.2 when a JoinTask is kicked off on a new
>> joiner.
>>
>>>> 4.5. GetConsistentHashCommand - an RPC command that "asks" a
>>>> node to serialize and transmit across its current CH impl.
>>>
>>> Serialize what ? The state (I assume) ? Or the consistent hash ?
>>> How can you serialize a CH in the latter case ?
>>
>> This refers to the CH impl. The CH impl that needs to be
>> serialized would typically be an instance of [3]. Since the CH
>> impl just contains a List of Addresses, this should be possible.
>> This is necessary so that the joiner knows the state of the cluster
>> before it joined, and is able to ask specific members for state.
>>
>> Another option to this is to assume that the previous state is the
>> same as the new state, minus the new joiner. Saves on RPC calls.
>
> OK, so why don't you simply use the View ? Or do you actually, on a
> view change, call setCaches() with the list of addresses you got
> from the View ?
I can do that on the existing nodes. On the new node (the joiner) I
need to get a hold of the previous view before the joiner came up.
But like I said above, perhaps this can be deduced.
Cheers
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org
More information about the infinispan-dev
mailing list