[infinispan-issues] [JBoss JIRA] Issue Comment Edited: (ISPN-1106) Rehashing into a running cluster causes shared processing lock contention
Erik Salter (JIRA)
jira-events at lists.jboss.org
Sun May 22 09:34:01 EDT 2011
[ https://issues.jboss.org/browse/ISPN-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603551#comment-12603551 ]
Erik Salter edited comment on ISPN-1106 at 5/22/11 9:33 AM:
------------------------------------------------------------
Per IRC chat, I used the new Distributed Executor to "sticky" a task to the data owner of a set of keys (with the same hash code). I then try to rehash.
I'm seeing a tremendous number of timeout errors, including lock cleanup issues. In my production environment (with a JTA transaction manager), I get an inordinate number of the following errors after the new node joins:
- org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!
The rehashing takes less than 20 seconds, but the cluster doesn't recover until 3-4 minutes outward.
Attached is a unit test that simulates my production environment. It can easily reproduce the lock timeout errors.
NOTE: It was built from master, which was 5.0.0.CR3. Just replace that in the pom.xml
was (Author: an1310):
Per IRC chat, I used the new Distributed Executor to "sticky" a task to the data owner of a set of keys (with the same hash code). I then try to rehash.
I'm seeing a tremendous number of timeout errors, including lock cleanup issues. In my production environment (with a JTA transaction manager), I get an inordinate number of the following errors after the new node joins:
- org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!
The rehashing takes less than 20 seconds, but the cluster doesn't recover until 3-4 minutes outward.
Attached is a unit test that simulates my production environment. It can easily reproduce the lock timeout errors.
> Rehashing into a running cluster causes shared processing lock contention
> -------------------------------------------------------------------------
>
> Key: ISPN-1106
> URL: https://issues.jboss.org/browse/ISPN-1106
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.0.0.CR2
> Reporter: Erik Salter
> Assignee: Manik Surtani
> Attachments: cacheTest.zip
>
>
> On our initial test of 5.0.0.CR2, we wanted to test the cluster's rehashing behavior/performance and if all locks were cleaned up.
> The test was to start two nodes, then add a third node, all the while issuing commands to it.
> Upon adding a third node, the cluster becomes inoperable. The stack traces is in the following location:
> http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node1.log,
> http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node2.log,
> http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node3.log
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list