[infinispan-dev] Issue about propagation of the RollbackCommand in Infinispan 5.2.0

Sebastiano Peluso peluso at gsd.inesc-id.pt
Fri Aug 17 11:59:02 EDT 2012


Hi Mircea,

On 8/17/12 4:59 PM, Mircea Markus wrote:
> On 17 Aug 2012, at 13:16, Sebastiano Peluso wrote:
>
>> Hi all,
>>
>> I have a question about the propagation of the RollbackCommand in
>> Infinispan 5.2.0 when I use the Optimistic locking scheme and the
>> Distribution clustering mode.
>>
>> In particular I have noticed that a RollbackCommand command for a
>> transaction T is propagated on a set of nodes S even if T's coordinator
>> has never sent and it will never send a PrepareCommand command to nodes
>> in S.
>>
>> I try to make clear the issue by the following example.
>>   Suppose you have a transaction T executing on node N0 and T writes on
>> keys k0, k1, k2,...., km (m+1 keys) until it reaches the prepare phase.
>> In addition node Ni, with i=0,...,m+1, is the ki's primary owner. If at
>> prepare time, during the lock acquisition on the local node N0 (see
>> visitPrepareCommand method in OptimisticLockingInterceptor class) T
>> fails to acquire the lock on k0, an exception is thrown (e.g.
>> TimeoutException) and T will be rolled back. In this case, when T starts
>> the rollback phase, it seems to me that a RollbackCommand command is
>> multicast to all nodes Nj, with j=1,...,d, if k0 is sorted after kj
>> during the local lock acquisition (see acquireAllLocks method in
>> OptimisticLockingInterceptor), because:
>>
>>   - shouldInvokeRemoteCommand method on the TxInvocationContext returns
>> true (see BaseRpcInterceptor class);
>>   - getAffectedKeys on the TxInvocationContext returns the set {k1,...,
>> kd} (see visitRollbackCommand in DistributionInterceptor class).
>>
>> Is it correct?
>>
>> If I'm not wrong, which is the design choice behind this implementation?
> This is indeed sub-optimal, but not incorrect.
Yes, you're right: this is correct but it can be sub-optimal.
> Does this break the TOA/TOB stuff? Mind creating a JIRA for it?
I think this does not affect TOA/TOB but the problem here is the 
relationship between this issue and the solution for the bug described 
in the JIRA [1]. In particular a possible solution for [1] can be 
registering an "out-of-order" rollback entry when a rollback message R 
is delivered on a remote node N that has not seen the related prepare 
message P, and annihilating that entry when P arrives. Unfortunately 
there is the possibility that P is not delivered to N just because R is 
a rollback generated due to a locally failed lock acquisition (as in the 
previous example). This behavior generates problems for garbage 
collecting those "out-of-order" entries thus making that solution 
inefficient and so (in my opinion) unfeasible.

Therefore maintaining that sub-optimal implementation for the Rollbacks 
propagation makes solving the problem in [1] harder.

Just a note on the possible solution you reported in [1]: waiting for 
all the ack/nack messages is not enough because a replication timeout 
exception can be thrown before a prepare message reaches all the 
participants.


Thank you for the reply.

Cheers,

     Sebastiano

[1] https://issues.jboss.org/browse/ISPN-2081



More information about the infinispan-dev mailing list