[infinispan-issues] [JBoss JIRA] Issue Comment Edited: (ISPN-1106) Rehashing into a running cluster causes shared processing lock contention

Sun May 15 22:57:00 EDT 2011

    [ https://issues.jboss.org/browse/ISPN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602179#comment-12602179 ] 

Erik Salter edited comment on ISPN-1106 at 5/15/11 10:55 PM:
-------------------------------------------------------------

As an update, I tested this behavior with the latest rehash changes from master.  It's much better, but there are still issues with a DIST-mode cluster, eager lock single node, and 2 owners.  

For instance:

- Start nodes 1 and 2.
- Run a series of transactions against the cluster.
- Start node 3.  Repeat tests.  There are some errors, but well within the expected behavior (i.e. not handling rollback exceptions in application code).
- Stop node 2.  Tests are still running.
- Restart node 2.  Here, I see that a bunch of failures happen due to locks held open on node 1.  Node 2 was now the primary data owner.  These locks are never released by node 1, even after the rehash.

I'll see if I can't build a unit test to reproduce this.

      was (Author: an1310):
    As an update, I tested this behavior with the latest rehash changes from master.  It's much better, but there are still issues with a DIST-mode cluster, eager lock single node, and 2 owners.  

For instance:

- Start nodes 1 and 2.
- Run a series of transactions against the cluster.
- Start node 3.  Repeat tests.  There are some errors, but well within the expected behavior (i.e. not handling rollback exceptions in application code).
- Stop node 2.  Tests are still running.
- Restart node 2.  Here, I see that a bunch of failures happen due to locks held open on node 1.  Node 2 was now the primary data owner.  These locks are never released by node 1, even after the rehash.

> Rehashing into a running cluster causes shared processing lock contention
> -------------------------------------------------------------------------
>
>                 Key: ISPN-1106
>                 URL: https://issues.jboss.org/browse/ISPN-1106
>             Project: Infinispan
>          Issue Type: Bug
>    Affects Versions: 5.0.0.CR2
>            Reporter: Erik Salter
>            Assignee: Manik Surtani
>
> On our initial test of 5.0.0.CR2, we wanted to test the cluster's rehashing behavior/performance and if all locks were cleaned up.
> The test was to start two nodes, then add a third node, all the while issuing commands to it.
> Upon adding a third node, the cluster becomes inoperable.  The stack traces is in the following location:
> http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node1.log, 
> http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node2.log, 
> http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node3.log

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira