[infinispan-dev] Let me understand DIST

Bela Ban bban at redhat.com
Fri Mar 16 03:27:46 EDT 2012



On 3/15/12 2:36 PM, Dan Berindei wrote:

>> So the section that issues the (say) 2 unicast PUTs isn't protected by a
>> lock ? I'm not suggesting to include the waiting for the results from
>> the 2 futures should be synchronized, but only the *sending* of the 2
>> unicasts. Then if thread 1 acquired that lock first and thread 2 second,
>> thread's 1's modifications would be delivered *before* thread 2's
>> modifications.
>> Of course, this would order only messages from the same sender.
>>
>
> We do acquire the key lock on the originator as well, I just missed it
> when I read the code.


OK



>> If we touch a lot of keys, then sending *all* of the keys to all owners
>> may be sub-optimal; as an optimization, we may want to send only the
>> keys to the nodes which need to store them. This would make the PREPARES
>> potentially much smaller.
>>
>
> Agree, but it's a non-trivial optimization. For instance if there is a
> view change between the prepare and the commit, the recipient of the
> commit may not have all the modifications in the list.


Can't we treat this as 2 state transfers ?

The first sends the key set to all affected nodes. The key set may not 
be accurate, in that some keys are missing, and others are not needed. 
But it is the *recipient* of the state transfer which discards unneeded 
keys (which it shouldn't own).

The missing keys will then be sent in the second state transfer.

This schem would batch state transfers in a queue: if we have more than 
1 ST, we combine them. E.g. if we have view V5,V6,V7 in the queue,  then 
we initiate a state transfer that transfers keys based on the diff 
between V5 and V7. If later a V8 is inserted into the queue, we batch 
the diffs between V7 and V8.


> With recovery enabled the originator would also have to get the list
> of modifications from all the targets and combine it into a single
> PREPARE command, much more complex than what happens now.


Is this really necessary ? Wouldn't the above schem work as well, and 
still be correct ?


>> So we have to touch roughly clusterSize * 2 keys to send modification
>> requests to all nodes.
>>
>
> Actually the number of required keys is growing a little faster. The
> chances of 2 * clusterSize keys touching all the nodes are
>   2: 1.0,
>   4: 0.985,
>   8: 0.91,
>   16: 0.796,
>   32: 0.585,
>   64: 0.327,
>   128: 0.092


OK




>> I meant TX rollbacks due to overlapping locks at different nodes,  the
>> stuff Pedro wrote about in his paper on total order.
>>
>
> Hmm, I thought because we sort the keys before locking it shouldn't be
> possible to have deadlocks between prepare commands. I was assuming
> that the Tx aborts in Pedro's tests were due to write skew check
> failures, but I just read his message again and he mentions write skew
> check is disabled.
> I must be missing something...


Even if we sort the keys, we can still have deadlocks, caused by 
PREPAREs arriving at different times at different nodes, e.g. the 
classic deadlock of TX1 acquiring locks on A, TX2 acquiring locks on B, 
then TX1 acquiring locks on B and TX2 acquiring locks on A.


-- 
Bela Ban, JGroups lead (http://www.jgroups.org)


More information about the infinispan-dev mailing list