[infinispan-issues] [JBoss JIRA] (ISPN-1581) Improve resiliency of retrying commits on state transfer

Wed Nov 30 18:11:40 EST 2011

    [ https://issues.jboss.org/browse/ISPN-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646945#comment-12646945 ] 

Erik Salter commented on ISPN-1581:
-----------------------------------

Here's a test we ran where we kill -9 a node, then immediately restarted it. This produced many stale locks. 

The node in question is dht10.  Here are the full trace logs.

http://dl.dropbox.com/u/50401510/5.1.0.CR1/StaleLock/dht10/dht10.rar
http://dl.dropbox.com/u/50401510/5.1.0.CR1/StaleLock/dht11/dht11.rar
http://dl.dropbox.com/u/50401510/5.1.0.CR1/StaleLock/dht12/dht12.rar
http://dl.dropbox.com/u/50401510/5.1.0.CR1/StaleLock/dht13/dht13.rar
http://dl.dropbox.com/u/50401510/5.1.0.CR1/StaleLock/dht14/dht14.rar
http://dl.dropbox.com/u/50401510/5.1.0.CR1/StaleLock/dht15/dht15.rar

> Improve resiliency of retrying commits on state transfer 
> ---------------------------------------------------------
>
>                 Key: ISPN-1581
>                 URL: https://issues.jboss.org/browse/ISPN-1581
>             Project: Infinispan
>          Issue Type: Enhancement
>    Affects Versions: 5.1.0.CR1
>            Reporter: Erik Salter
>            Assignee: Manik Surtani
>
> The current implementation of ISPN-1484 will retry up to 3 times to retry a commit on a remote node.  This is resilient to 3 state transfers happening in rapid succession.  However, if the cluster loses > 3 nodes, there can still be stale locks.
> This is evident in testing this with the TopologyAwareConsistentHash.  I lost a "site" consisting of 4 nodes, and I was able to get stale locks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira