]
Mircea Markus updated ISPN-2825:
--------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
ClusterTopologyManagerImpl should not hold a lock while invoking an
RPC
-----------------------------------------------------------------------
Key: ISPN-2825
URL:
https://issues.jboss.org/browse/ISPN-2825
Project: Infinispan
Issue Type: Bug
Components: State transfer
Affects Versions: 5.2.1.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Priority: Critical
Fix For: 5.3.0.Alpha1, 5.3.0.Final
On the coordinator, ClusterTopologyManagerImpl holds a lock on a cache's
ClusterCacheStatus while it is invoking a synchronous REBALANCE_START or CH_UPDATE
command. This helps ensure the ordering of the commands is the same on all the members.
However, this has some downsides. On a joining node, it takes quite some time before
replying to the coordinator (as it needs to request transactions from the other nodes).
The nodes that don't need to request any data will send a REBALANCE_CONFIRM command to
the coordinator right away, but that command will block on the ClusterCacheStatus lock. If
the number of OOB threads is limited, this can even lead to a deadlock.
Now that CH_UPDATE commands also increment the topology id, we don't really need to
enforce the same ordering. If a CH_UPDATE command is sent after a REBALANCE_START command
but arrives before it, LocalTopologyManagerImpl just needs to act as if the CH_UPDATE
command was actually a REBALANCE_START. (It knows there should be a rebalance when a
CH_UPDATE command has pendingCH != null.)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: