[
https://issues.jboss.org/browse/ISPN-2081?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-2081:
------------------------------------
I'm not sure about the first step, "try to acquire a lock locally". Say the
local node (B) is a backup owner for key k1, the tx (tx1) acquires the lock locally, then
it tries to go remote to the primary owner (A). The lock is held by another tx (tx2,
originated from another node C) on the primary, but then A dies and B becomes the primary
owner.
The tx1 caller will get a SuspectException, and the lock on B will be released. But the
commit/tx completion command for tx2 might arrive on B just before that happens, and tx2
will try to unlock k1 while it is held by tx1 - resulting in an
IllegalMonitorStateException.
I've been also thinking for some time that users shouldn't see SuspectExceptions
and Infinispan should retry those commands instead. My idea was to retry automatically in
the DistributionInterceptor, but if tx1 already held the lock on B it would be able to
commit before tx2 - meaning we have to move the retry logic at a lower level.
Transaction leak caused by reordering between prepare and rollback
------------------------------------------------------------------
Key: ISPN-2081
URL:
https://issues.jboss.org/browse/ISPN-2081
Project: Infinispan
Issue Type: Bug
Components: Transactions
Affects Versions: 5.1.5.FINAL
Reporter: Mircea Markus
Assignee: Mircea Markus
Priority: Blocker
Labels: jdg, jdg6
Fix For: 5.2.0.CR1, 5.2.0.Final
Attachments: DistL1WriteSkewTest.txt, RollbackNotSentBeforePrepareTest.java
There's no ordering between the prepare and commit/rollback messages, as the later
are sent OOB.
With this in mind, the following transaction leak might happen:
Tx1 send prepare on nodes {A,B}
1. the message reaches A and timeouts but hasn't yet been processed on B
2. The transaction originator reacts immediately to the timeout received from A without
waiting the response from B and sends a rollback request
3. The rollback request is processed on A and B
4. The initial prepare is then processed on B
At this point we have an orphan transaction prepare on B.
Whilst this is not causing any inconsistencies, it keeps keys locked indefinitely and is
a memory leak.
The solution would be to wait at 2 for all the prepare messages *before* sending the
rollback.
Attached is a unit test to reproduce the issue.
Related mailing list thread:
http://infinispan.markmail.org/search/#query:%20list%3Aorg.jboss.lists.in...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira