[
https://issues.jboss.org/browse/ISPN-2966?page=com.atlassian.jira.plugin....
]
Mircea Markus updated ISPN-2966:
--------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
NBST: Concurrent leavers can lead to deadlock
---------------------------------------------
Key: ISPN-2966
URL:
https://issues.jboss.org/browse/ISPN-2966
Project: Infinispan
Issue Type: Bug
Reporter: Pedro Ruivo
Assignee: Dan Berindei
Labels: state_transfer
Fix For: 5.3.0.Final
Attachments: thread-dump.txt, trace.log
This sequence of events, leads to a thread deadlock in the coordinator
{code}
1) NodeF sends LEAVE message. new topologyId=8
2) NodeE delivers REBALANCE_START(8)
3) NodeF and NodeG delivers REBALANCE_START(8)
4) NodeH delivers GET_TRANSACTION(8) from NodeE ==> Transactions were requested by
node ConcurrentNonOverlappingLeaveTest-NodeE-28744 with topology 8, greater than the local
topology (7). Waiting for topology 8 to be installed locally.
5) NodeH sends LEAVE message. new topologyId=9
6) NodeH delivers REBALANCE_START(8) ==> Ignoring rebalance 8 for cache dist that
doesn't exist locally
7) NodeH delivers GET_TRANSACTION(8) from NodeG ==> Transactions were requested by
node ConcurrentNonOverlappingLeaveTest-NodeG-31669 with topology 8, greater than the local
topology (7). Waiting for topology 8 to be installed locally.
{code}
Possible solutions are:
- send the REBALANCE_START/CH_UPDATE async
- throw an exception when a GET_TRANSACTION is received and the node is shutting down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira