[
https://issues.jboss.org/browse/ISPN-1106?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-1106:
------------------------------------
Ok, I think I finally understand what is going on.
Say there is a cluster {A, B}, numOwners = 2, and C joins, but there is a put operation
going on during the join.
The sequence of events could be something like this:
||A||B||
|put obtains the tx lock (read)| |
|view change event|view change event|
|rehash tries to acquire tx lock (write)|rehash acquires tx lock (write)|
| |put tries to obtain the tx lock (read)|
The put operation and the rehash process deadlock, because the sender is holding the tx
lock for the entire duration of the remote call (tx lock is acquired in DistTxInterceptor,
distribution is done in DistributionInterceptor).
My first thought to fix this was to acquire the lock on non-origin nodes with 0 timeout,
but this will cause a failure on some nodes after the operation already succeeded on the
local node and I'm how this will work out. E.g. in the case of commit, we should
succeed regardless of any rehash activity, but we should retry the operation in order to
succeed (releasing the local tx lock first) and I don't think we do that.
Rehashing into a running cluster causes shared processing lock
contention
-------------------------------------------------------------------------
Key: ISPN-1106
URL:
https://issues.jboss.org/browse/ISPN-1106
Project: Infinispan
Issue Type: Bug
Affects Versions: 5.0.0.CR2
Reporter: Erik Salter
Assignee: Dan Berindei
Fix For: 5.0.0.CR4
Attachments: cacheTest.zip, ISPN-1106.log
On our initial test of 5.0.0.CR2, we wanted to test the cluster's rehashing
behavior/performance and if all locks were cleaned up.
The test was to start two nodes, then add a third node, all the while issuing commands to
it.
Upon adding a third node, the cluster becomes inoperable. The stack traces is in the
following location:
http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node1.log,
http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node2.log,
http://dl.dropbox.com/u/10929737/5.0.0.CR2/server_node3.log
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/software/jira