[infinispan-issues] [JBoss JIRA] (ISPN-2825) ClusterTopologyManagerImpl should not hold a lock while invoking an RPC

Thu Feb 14 14:26:57 EST 2013

Dan Berindei created ISPN-2825:
----------------------------------

             Summary: ClusterTopologyManagerImpl should not hold a lock while invoking an RPC
                 Key: ISPN-2825
                 URL: https://issues.jboss.org/browse/ISPN-2825
             Project: Infinispan
          Issue Type: Bug
          Components: State transfer
    Affects Versions: 5.2.1.Final
            Reporter: Dan Berindei
            Assignee: Dan Berindei
             Fix For: 5.3.0.Alpha1

On the coordinator, ClusterTopologyManagerImpl holds a lock on a cache's ClusterCacheStatus while it is invoking a synchronous REBALANCE_START or CH_UPDATE command. This helps ensure the ordering of the commands is the same on all the members.

However, this has some downsides. On a joining node, it takes quite some time before replying to the coordinator (as it needs to request transactions from the other nodes). The nodes that don't need to request any data will send a REBALANCE_CONFIRM command to the coordinator right away, but that command will block on the ClusterCacheStatus lock. If the number of OOB threads is limited, this can even lead to a deadlock.

Now that CH_UPDATE commands also increment the topology id, we don't really need to enforce the same ordering. If a CH_UPDATE command is sent after a REBALANCE_START command but arrives before it, LocalTopologyManagerImpl just needs to act as if the CH_UPDATE command was actually a REBALANCE_START. (It knows there should be a rebalance when a CH_UPDATE command has pendingCH != null.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira