[infinispan-issues] [JBoss JIRA] (ISPN-5420) Thread pools are depleted by ClusterTopologyManagerImpl.waitForView() and causing deadlock

Dan Berindei (JIRA) issues at jboss.org
Wed Apr 29 10:58:46 EDT 2015


Dan Berindei created ISPN-5420:
----------------------------------

             Summary: Thread pools are depleted by ClusterTopologyManagerImpl.waitForView() and causing deadlock
                 Key: ISPN-5420
                 URL: https://issues.jboss.org/browse/ISPN-5420
             Project: Infinispan
          Issue Type: Bug
          Components: Core
    Affects Versions: 7.1.1.Final, 6.0.2.Final
            Reporter: Dan Berindei
            Assignee: Dan Berindei
             Fix For: 7.2.0.Final


The join process was designed in the idea that a node would start its caches in sequential order, so {{ClusterTopologyManager.waitForView()}} would block at most once for each joining node. However, WildFly actually starts {{2 * Runtime.availableProcessors()}} caches in parallel, and this can be a problem when the machine has a lot of cores and multiple nodes.

{{ClustertopologyManager.handleClusterView()}} only updates the {{viewId}} after it updated the cache topologies of each cache AND after it confirmed the availability of all the nodes with a {{POLICY_GET_STATUS}} RPC. This RPC can block, and it's very easy for the remote-executor thread pool on the coordinator to become overloades with threads like this:

{noformat}
"remote-thread-172" daemon prio=10 tid=0x00007f0cc48c0000 nid=0x28ca4 in Object.wait() [0x00007f0c5f25b000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.infinispan.topology.ClusterTopologyManagerImpl.waitForView(ClusterTopologyManagerImpl.java:357)
        - locked <0x00000000ff3bd900> (a java.lang.Object)
        at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
        at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:162)
        at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:144)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:276)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


More information about the infinispan-issues mailing list