[infinispan-dev] Distribution, take 2

Manik Surtani manik at jboss.org
Mon Jul 20 11:32:19 EDT 2009


On 20 Jul 2009, at 16:01, Bela Ban wrote:

>
>
> Manik Surtani wrote:
>>
>> On 20 Jul 2009, at 14:51, Bela Ban wrote:
>>
>>> General comments:
>>> - It seems that this is a mechanism where you replicate to the  
>>> repl_count instances 'next' to you, ie. similar to Buddy  
>>> Replication ?
>>
>> Yes.  So if you consider the hash space on a clockwise-incrementing  
>> wheel [1], nodes {A, B, C, D} each have fixed positions on this  
>> wheel (perhaps by making use of the hash codes on their  
>> Addresses).  So this becomes deterministic.  Any key (K1 ~ K4 in  
>> the diagram in [1]) can be located by ascertaining their place on  
>> the hash wheel, moving clockwise, and considering the first  
>> "repl_count" nodes encountered.
>
> OK
>
>
>>
>>> - Are you tying your rebalancing mechanism to the consistent hash  
>>> implementation ? This would IMO be bad because it doesn't allow  
>>> for plugging in of a different CH algorithm !
>>
>> Unfortunately the steps I have outlined below does imply some  
>> knowledge of the hash algorithm, to determine which nodes are  
>> affected by a LEAVE event, to minimise rebalancing.  Perhaps this  
>> can be provided by the ConsistentHash interface, so that any new  
>> implementation would need to provide a list of affected nodes when  
>> a give node leaves.
>
> If this is the case, then I'd make sure that both the CH  
> implementation *and* the rebalancing policy are pluggable, so you  
> could write an instance of both and use it.
>
> For example, I can think of CH implementations that have  
> 'geographical' knowledge, e.g. hosts on which nodes are running and  
> try to store keys on nodes which are as 'far' apart from each other  
> as possible, similar to what you did with Buddy Replication where  
> you make sure the buddies are on different hosts, or not on blades  
> within the same rack.

Yes, good point.

>>> What does 'installing a consisten hash instance' mean ?
>>
>> Literally, it means swapping out the CH instance used on a given  
>> node with an alternate one.  A CH instance encapsulates a "node  
>> list" and the logic to locate a key within the node list, using  
>> hash functions to determine which nodes would hold a given key.
>
> Ah, so I assume you're sending a new node list, but don't swap out  
> the CH *logic* which is always the same, right ?

Yes.  The DefaultCH would just serialize the member list.  Other CH  
impls may contain additional data.

>  See this interface here [2], and a default (if currently imperfect)  
> implementation [3].  CHs are immutable so when a new view is  
> available, a new CH instance with a new node list should be created.
>>
>> In addition, aggregate CHs are allowed, to consider the union of  
>> two different views.  This is represented by a delegating CH impl,  
>> and is used in step 5.2 when a JoinTask is kicked off on a new  
>> joiner.
>>
>>>>    4.5. GetConsistentHashCommand - an RPC command that "asks" a  
>>>> node to serialize and transmit across its current CH impl.
>>>
>>> Serialize what ? The state (I assume) ? Or the consistent hash ?  
>>> How can you serialize a CH in the latter case ?
>>
>> This refers to the CH impl.  The CH impl that needs to be  
>> serialized would typically be an instance of [3].  Since the CH  
>> impl just contains a List of Addresses, this should be possible.   
>> This is necessary so that the joiner knows the state of the cluster  
>> before it joined, and is able to ask specific members for state.
>>
>> Another option to this is to assume that the previous state is the  
>> same as the new state, minus the new joiner.  Saves on RPC calls.
>
> OK, so why don't you simply use the View ? Or do you actually, on a  
> view change, call setCaches() with the list of addresses you got  
> from the View ?

I can do that on the existing nodes.  On the new node (the joiner) I  
need to get a hold of the previous view before the joiner came up.   
But like I said above, perhaps this can be deduced.

Cheers
--
Manik Surtani
manik at jboss.org
Lead, Infinispan
Lead, JBoss Cache
http://www.infinispan.org
http://www.jbosscache.org







More information about the infinispan-dev mailing list