[
https://issues.jboss.org/browse/ISPN-4137?page=com.atlassian.jira.plugin....
]
Radim Vansa commented on ISPN-4137:
-----------------------------------
{quote}
Without recovery, we will still run the rollback to make sure the locks are released. I
don't have any suggestions on how to improve that, except maybe retrying the commit
indefinitely.
{quote}
I don't see the point in committing indefinitely. There's now way how the
CommitCommand can be lost - JGroups should deliver it reliably, and we are not dropping
delivered commands anywhere in Infinispan (or do we?). It can just take a while before it
is delivered and responded. The only consideration for any resending is sending to new
nodes.
What's the contract, anyway? When the commit() throws exception, are there any
guarantees that none of the operations were written? *Is this described anywhere?*
If there are no such guarantees, trying to finish the TX with commit even if exception was
reported on originator is IMO better than send a rollback (and hope things will settle) or
keeping the locks stale. If there are any such guarantees, we can't do anything, and
we should rather keep the lock stale (blocking further txs) than break the contract.
Thinking about it again, there can't be any guarantees because the commit can be
already executed - the contract would be broken.
{quote}
If the primary owner crashes, the txs on the backup owners still have "backup
locks". Each prepare on the new primary owner will check in the entire tx table for
backup locks, and will block until those locks are released. The real problem is when the
originator dies...
{quote}
When the originator dies after prepare, the transaction keeps hanging anyway. Is it then
reported in-doubt in recovery?
Transaction executed multiple times due to forwarded CommitCommand
------------------------------------------------------------------
Key: ISPN-4137
URL:
https://issues.jboss.org/browse/ISPN-4137
Project: Infinispan
Issue Type: Bug
Components: State Transfer, Transactions
Reporter: Radim Vansa
Assignee: Dan Berindei
Priority: Critical
When the {{StateTransferInterceptor}} forwards a CommitCommand for the new topology,
multiple CommitCommands may be broadcast across the cluster. If the command (forwarded
already from originator) times out, the transaction may be correctly finished by the first
one and the application considers TX as succeeded (useSynchronizations=true), although one
more Rollback is sent as well.
Then, again in STI, when the CommitCommand arrives with higher topologyId than the one
used for the first TX execution, another artificial Prepare (followed by the commit) is
executed - see {{STI.visitCommitCommand}}.
However, this execution may be delayed a lot and originator may have already executed
another TX on the same entries. Then, this forwarded Commit will overwrite the already
updated entries, causing inconsistency of data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira