[infinispan-issues] [JBoss JIRA] (ISPN-4137) Transaction executed multiple times due to forwarded CommitCommand

Wed Mar 26 11:09:13 EDT 2014

    [ https://issues.jboss.org/browse/ISPN-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12956650#comment-12956650 ] 

Radim Vansa commented on ISPN-4137:
-----------------------------------

{quote}
That's an interesting approach, but I don't see any way of implementing that right now - when we get a timeout from an RPC, we can't spawn another thread to wait for the "real" response. So we have to either wait forever, or set a timeout and somehow release the locks when a timeout occurs without causing more inconsistencies than necessary.
{quote}
My idea is that originator *must* send only the prepare command. Originator *can* send rollback if and only if it got negative ack from some of the nodes. It *can* sent commit if it got positive ack from all nodes.
An automatic reconciliation is executed periodically (on any node participating in the transaction), to find out what's the state of each ongoing transaction (after some random timeout in order to not create too much load in case that everything goes smoothly). If any of the nodes report that prepare failed, rollback is sent to all nodes. In other case, commit is sent to all non-committed nodes.
When a node is committed, it sends a confirmation to the originator. After calling prepare, the originator waits for all confirmations - if it gets them, TX is successful, otherwise an exception is thrown.
With this algorithm, the transaction result is determined by the success in prepare, and commit/rollback are a matter of the whole cluster, not originator - this is only notified.
Also, rollback is faster, as it is async.

If primary owner crashes during transaction, new primary owner should trigger the reconciliation. It has to find out whether there's any committed node in this transaction - if it is, it has to mark itself prepared (as such node could not be committed unless the previous owner got itself prepared as well). If there was no committed node, it gets complicated - should we get prepared or should we declare that prepare has failed for this node?
There is a race condition - some node could receive ack from the old primary owner and become committed just after it responded to the new primary that it's prepared. That way, we could get new node failed and old one committed. Therefore, any query with new topology has to invalidate all responses from lower topologies received after that - so it can't make a decision after being queried before sending and receiving synchronous query from all other nodes in the new topology (including this node, which has not responded yet to anyone). Then, as we know nobody would become committed before we answer them, we may mark ourselves failed. (The reason why we have to fail is that there may be another transaction on the same key, which has some nodes committed)
Regarding multiple new primaries, these will always query only the old nodes - all new primaries should fail unless there is some committed old node.

Another problem I see is when the prepare is delayed. Then, the reconciliation has to mark the tx prepare as failed on the node, therefore, rolling back the TX, and make sure that when the prepare arrives, it won't execute it. When the transaction is completed on one node, we can't forget it immediately. We have to keep the information until the transaction is committed in all nodes. This is where the TxCompletionNotification should be involved. The way to detect how long we have to keep the information about tx is a matter of ISPN-4131. Forwarding makes this a bit complicated.

Note that the reconciliation may be explicitly triggered by originator after timeout for ack (negative or positive) for the prepare, but it does not have to keep the thread working on the transaction waiting. (In fact regular commit is a blocking reconciliation as well).

Are there any other shortcomings?

{quote}
There's also a problem with reporting success before the transaction committed on all the owners. A subsequent get(k) on the same thread may return the value from the node that didn't commit put(k, v) yet, so the user would see an inconsistency.
{quote}
Definitely, you can't report success unless you got positive acks for everything.

{quote}
But I can't agree with you on that rule, the constant XAException.XA_HEURMIX is exactly for this kind of situation.
{quote}
The guys who designed transaction API are definitely smarter than me, but let's find out whether we could design it in a way to evade it (maybe it's not possible).

> Transaction executed multiple times due to forwarded CommitCommand
> ------------------------------------------------------------------
>
>                 Key: ISPN-4137
>                 URL: https://issues.jboss.org/browse/ISPN-4137
>             Project: Infinispan
>          Issue Type: Bug
>          Components: State Transfer, Transactions
>            Reporter: Radim Vansa
>            Assignee: Dan Berindei
>            Priority: Critical
>
> When the {{StateTransferInterceptor}} forwards a CommitCommand for the new topology, multiple CommitCommands may be broadcast across the cluster. If the command (forwarded already from originator) times out, the transaction may be correctly finished by the first one and the application considers TX as succeeded (useSynchronizations=true), although one more Rollback is sent as well.
> Then, again in STI, when the CommitCommand arrives with higher topologyId than the one used for the first TX execution, another artificial Prepare (followed by the commit) is executed - see {{STI.visitCommitCommand}}.
> However, this execution may be delayed a lot and originator may have already executed another TX on the same entries. Then, this forwarded Commit will overwrite the already updated entries, causing inconsistency of data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira