[
https://issues.jboss.org/browse/ISPN-2713?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-2713:
------------------------------------
With the fix for ISPN-2825, the thread sending the REBALANCE_START command won't hold
the lock on the ClusterCacheStatus any more, and the REBALANCE_CONFIRM command will be
able to proceed.
REBALANCE_START and REBALANCE_CONFIRM commands deadlock when
RSVP.ack_on_delivery=true
--------------------------------------------------------------------------------------
Key: ISPN-2713
URL:
https://issues.jboss.org/browse/ISPN-2713
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.2.0.CR1
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 5.3.0.Final
When the coordinator sends a REBALANCE_START command, it holds a lock on the
ClusterCacheStatus until it receives the responses from all the other members.
If a node doesn't need to request any new state, it sends the rebalance confirmation
to the coordinator on the same thread that received the REBALANCE_START command. The
REBALANCE_CONFIRM command also wants to acquire a lock on the ClusterCacheStatus on the
coordinator, but because the REBALANCE_CONFIRM command is sent asynchronously, it
doesn't deadlock with the thread waiting for REBALANCE_START responses on the
coordinator.
At least, that's what happens when {{RSVP.ack_on_delivery=false}} (the Infinispan
default). When {{RSVP.ack_on_delivery=true}} (the JGroups default), the
"asynchronous" REBALANCE_CONFIRM command becomes synchronous, and it generates a
deadlock. The rebalance then fails after the RSVP timeout expires (10 seconds by default).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira