Hi Mircea,
On 8/17/12 4:59 PM, Mircea Markus wrote:
On 17 Aug 2012, at 13:16, Sebastiano Peluso wrote:
> Hi all,
>
> I have a question about the propagation of the RollbackCommand in
> Infinispan 5.2.0 when I use the Optimistic locking scheme and the
> Distribution clustering mode.
>
> In particular I have noticed that a RollbackCommand command for a
> transaction T is propagated on a set of nodes S even if T's coordinator
> has never sent and it will never send a PrepareCommand command to nodes
> in S.
>
> I try to make clear the issue by the following example.
> Suppose you have a transaction T executing on node N0 and T writes on
> keys k0, k1, k2,...., km (m+1 keys) until it reaches the prepare phase.
> In addition node Ni, with i=0,...,m+1, is the ki's primary owner. If at
> prepare time, during the lock acquisition on the local node N0 (see
> visitPrepareCommand method in OptimisticLockingInterceptor class) T
> fails to acquire the lock on k0, an exception is thrown (e.g.
> TimeoutException) and T will be rolled back. In this case, when T starts
> the rollback phase, it seems to me that a RollbackCommand command is
> multicast to all nodes Nj, with j=1,...,d, if k0 is sorted after kj
> during the local lock acquisition (see acquireAllLocks method in
> OptimisticLockingInterceptor), because:
>
> - shouldInvokeRemoteCommand method on the TxInvocationContext returns
> true (see BaseRpcInterceptor class);
> - getAffectedKeys on the TxInvocationContext returns the set {k1,...,
> kd} (see visitRollbackCommand in DistributionInterceptor class).
>
> Is it correct?
>
> If I'm not wrong, which is the design choice behind this implementation?
This is indeed sub-optimal, but not incorrect.
Yes, you're right: this is
correct but it can be sub-optimal.
Does this break the TOA/TOB stuff? Mind creating a JIRA for it?
I think this does not affect TOA/TOB but the problem here is the
relationship between this issue and the solution for the bug described
in the JIRA [1]. In particular a possible solution for [1] can be
registering an "out-of-order" rollback entry when a rollback message R
is delivered on a remote node N that has not seen the related prepare
message P, and annihilating that entry when P arrives. Unfortunately
there is the possibility that P is not delivered to N just because R is
a rollback generated due to a locally failed lock acquisition (as in the
previous example). This behavior generates problems for garbage
collecting those "out-of-order" entries thus making that solution
inefficient and so (in my opinion) unfeasible.
Therefore maintaining that sub-optimal implementation for the Rollbacks
propagation makes solving the problem in [1] harder.
Just a note on the possible solution you reported in [1]: waiting for
all the ack/nack messages is not enough because a replication timeout
exception can be thrown before a prepare message reaches all the
participants.
Thank you for the reply.
Cheers,
Sebastiano
[1]
https://issues.jboss.org/browse/ISPN-2081