[infinispan-issues] [JBoss JIRA] (ISPN-2081) Transaction leak caused by reordering between prepare and rollback

Dan Berindei (JIRA) jira-events at lists.jboss.org
Sun Oct 14 09:24:01 EDT 2012


    [ https://issues.jboss.org/browse/ISPN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726256#comment-12726256 ] 

Dan Berindei commented on ISPN-2081:
------------------------------------

I'm not sure about the first step, "try to acquire a lock locally". Say the local node (B) is a backup owner for key k1, the tx (tx1) acquires the lock locally, then it tries to go remote to the primary owner (A). The lock is held by another tx (tx2, originated from another node C) on the primary, but then A dies and B becomes the primary owner.

The tx1 caller will get a SuspectException, and the lock on B will be released. But the commit/tx completion command for tx2 might arrive on B just before that happens, and tx2 will try to unlock k1 while it is held by tx1 - resulting in an IllegalMonitorStateException.

I've been also thinking for some time that users shouldn't see SuspectExceptions and Infinispan should retry those commands instead. My idea was to retry automatically in the DistributionInterceptor, but if tx1 already held the lock on B it would be able to commit before tx2 - meaning we have to move the retry logic at a lower level.
                
> Transaction leak caused by reordering between prepare and rollback
> ------------------------------------------------------------------
>
>                 Key: ISPN-2081
>                 URL: https://issues.jboss.org/browse/ISPN-2081
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 5.1.5.FINAL
>            Reporter: Mircea Markus
>            Assignee: Mircea Markus
>            Priority: Blocker
>              Labels: jdg, jdg6
>             Fix For: 5.2.0.CR1, 5.2.0.Final
>
>         Attachments: DistL1WriteSkewTest.txt, RollbackNotSentBeforePrepareTest.java
>
>
> There's no ordering between the prepare and commit/rollback messages, as the later are sent OOB.
> With this in mind, the following transaction leak might happen:
> Tx1 send prepare on nodes {A,B}
> 1. the message reaches A and timeouts but hasn't yet been processed on B
> 2. The transaction originator reacts immediately to the timeout received from A without waiting the response from B and sends a rollback request
> 3. The rollback request is processed on A and B 
> 4. The initial prepare is then processed on B
> At this point we have an orphan transaction prepare on B.
> Whilst this is not causing any inconsistencies, it keeps keys locked indefinitely and is a memory leak.  
> The solution would be to wait at 2 for all the prepare messages *before* sending the rollback.
> Attached is a unit test to reproduce the issue.
> Related mailing list thread: http://infinispan.markmail.org/search/#query:%20list%3Aorg.jboss.lists.infinispan-dev+page:1+mid:xgnmtee56jpqifs6+state:results  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list