[infinispan-issues] [JBoss JIRA] Issue Comment Edited: (ISPN-1106) Rehashing into a running cluster causes shared processing lock contention

Sunday, 22 May 2011

    [
https://issues.jboss.org/browse/ISPN-1106?page=com.atlassian.jira.plugin....
] 

Erik Salter edited comment on ISPN-1106 at 5/22/11 9:33 AM:
------------------------------------------------------------

Per IRC chat, I used the new Distributed Executor to "sticky" a task to the data
owner of a set of keys (with the same hash code).  I then try to rehash.

I'm seeing a tremendous number of timeout errors, including lock cleanup issues.  In
my production environment (with a JTA transaction manager), I get an inordinate number of
the following errors after the new node joins:

- org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!

The rehashing takes less than 20 seconds, but the cluster doesn't recover until 3-4
minutes outward.

Attached is a unit test that simulates my production environment.  It can easily reproduce
the lock timeout errors.  

NOTE:  It was built from master, which was 5.0.0.CR3.  Just replace that in the pom.xml

      was (Author: an1310):
    Per IRC chat, I used the new Distributed Executor to "sticky" a task to the
data owner of a set of keys (with the same hash code).  I then try to rehash.

I'm seeing a tremendous number of timeout errors, including lock cleanup issues.  In
my production environment (with a JTA transaction manager), I get an inordinate number of
the following errors after the new node joins:

- org.infinispan.util.concurrent.TimeoutException: Timed out waiting for valid responses!

The rehashing takes less than 20 seconds, but the cluster doesn't recover until 3-4
minutes outward.

Attached is a unit test that simulates my production environment.  It can easily reproduce
the lock timeout errors.

...
 Rehashing into a running cluster causes shared processing lock
contention
 -------------------------------------------------------------------------

                 Key: ISPN-1106
                 URL: https://issues.jboss.org/browse/ISPN-1106
             Project: Infinispan
          Issue Type: Bug
    Affects Versions: 5.0.0.CR2
            Reporter: Erik Salter
            Assignee: Manik Surtani
         Attachments: cacheTest.zip

 On our initial test of 5.0.0.CR2, we wanted to test the cluster's rehashing
behavior/performance and if all locks were cleaned up.
 The test was to start two nodes, then add a third node, all the while issuing commands to
it.
 Upon adding a third node, the cluster becomes inoperable.  The stack traces is in the
following location:
 http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node1.log, 
 http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node2.log, 
 http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node3.log 
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] Issue Comment Edited: (ISPN-1106) Rehashing into a running cluster causes shared processing lock contention