[infinispan-issues] [JBoss JIRA] (ISPN-1343) Stale locks occurring on rehashing into a running cluster.

Thu Oct 20 02:26:45 EDT 2011

    [ https://issues.jboss.org/browse/ISPN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635961#comment-12635961 ] 

Dan Berindei commented on ISPN-1343:
------------------------------------

Ok, I think I found the problem.

When a PrepareCommand fails (in our case because it didn't get a reply from the remote node in 15 seconds), LockingInterceptor catches the exception and releases all the locks acquired by the transaction.

After that TransactionCoordinator starts a proper rollback, but the entries in the invocation context are no longer locked (and in 5.0 that means they only have the VALID flag set, not CHANGED or CREATED). That means local locks are not released twice, but it also means the PrepareCommand is not executed remotely (BaseRpcInterceptor.shouldInvokeRemoteTxCommand() returns false).

I see 3 possible solutions:
1. Don't clean up the locks at all in LockingInterceptor (if the invocation context is transactional) and rely on TransactionCoordinator to do it.
2. Clean up the locks but don't call rollback() on the entry.
3. Rely on LocalTransaction's getRemoteLocksAcquired() instead of the locked entries to determine if the rollback command should be executed on the remote nodes.

I've only looked at this on 5.0, I'll get Mircea to take a look at the issue for 5.1.

> Stale locks occurring on rehashing into a running cluster.
> ----------------------------------------------------------
>
>                 Key: ISPN-1343
>                 URL: https://issues.jboss.org/browse/ISPN-1343
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Distributed Cache
>    Affects Versions: 5.0.0.FINAL
>            Reporter: Erik Salter
>            Assignee: Dan Berindei
>             Fix For: 5.0.1.FINAL, 5.1.0.ALPHA1, 5.1.0.FINAL
>
>         Attachments: ISPN-5-stale-lock.zip, ISPN-5-test-rehash.zip, RehashTest.java, RehashTest.java
>
>
> There is a stale lock issue that occurs when a client thread requests the same set of eagerly-locked keys across a rehash.  Attached is a unit test that can reproduce the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira