[infinispan-issues] [JBoss JIRA] (ISPN-1872) Coordinator hangs when cache is loaded to it and l1cache enabled in cluster

Mon Feb 20 12:15:36 EST 2012

Matt Davis created ISPN-1872:
--------------------------------

             Summary: Coordinator hangs when cache is loaded to it and l1cache enabled in cluster
                 Key: ISPN-1872
                 URL: https://issues.jboss.org/browse/ISPN-1872
             Project: Infinispan
          Issue Type: Bug
          Components: Distributed Cache
    Affects Versions: 5.1.1.FINAL
            Reporter: Matt Davis
            Assignee: Manik Surtani
            Priority: Blocker

Scaled from 3 nodes to 4 nodes and ran into this issue with both 5.1.1 and trunk (5.2.0 snapshot from 2.18.12).

I altered the slider in the gui demo to allow for 1,000,000 cache entries.  If I generate the cache on the coordinator node, and the following exception occurs :

2012-02-15 12:40:49,633 ERROR [InvocationContextInterceptor] 
(pool-1-thread-1) ISPN000136: Execution error
org.infinispan.util.concurrent.TimeoutException: Replication timeout for 
muskrat-626
         at 
org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:99)
         at 
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:461)
         at 
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:148)
         at 
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:169)
         at 
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:219)
         at 
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:206)
         at 
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:201)
         at 
org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
         at 
org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:494)
         at 
org.infinispan.interceptors.DistributionInterceptor.visitPutMapCommand(DistributionInterceptor.java:285)
         at 
org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
         at 
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
         at 
org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:199)
         at 
org.infinispan.interceptors.EntryWrappingInterceptor.visitPutMapCommand(EntryWrappingInterceptor.java:160)
         at 
org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
         at 
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
         at 
org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitPutMapCommand(NonTransactionalLockingInterceptor.java:84)
         at 
org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
         at 
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
         at 
org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)
         at 
org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:77)
         at 
org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
         at 
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
         at 
org.infinispan.interceptors.StateTransferLockInterceptor.handleWithRetries(StateTransferLockInterceptor.java:207)
         at 
org.infinispan.interceptors.StateTransferLockInterceptor.handleWriteCommand(StateTransferLockInterceptor.java:180)
         at 
org.infinispan.interceptors.StateTransferLockInterceptor.visitPutMapCommand(StateTransferLockInterceptor.java:171)
         at 
org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
         at 
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
         at 
org.infinispan.interceptors.CacheMgmtInterceptor.visitPutMapCommand(CacheMgmtInterceptor.java:110)
         at 
org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
         at 
org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
         at 
org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:130)
         at 
org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:89)
         at 
org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:77)
         at 
org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
         at 
org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)
         at 
org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:941)
         at org.infinispan.CacheImpl.putAll(CacheImpl.java:678)
         at org.infinispan.CacheImpl.putAll(CacheImpl.java:671)
         at org.infinispan.CacheSupport.putAll(CacheSupport.java:66)
         at 
org.infinispan.demo.InfinispanDemo$7$1.run(InfinispanDemo.java:251)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
         at java.lang.Thread.run(Thread.java:679)

At that time the gui for the coordinator becomes unresponsive.  I attached jconsole to the 4 nodes and forced a system.gc.  The coordinator node sits at 62MB heap after gc, while the other 3 nodes are sitting around 280MB.  The cache distribution has not succeeded on this node.  If I kill one of the other nodes, the coordinator instantly becomes responsive.  In the final state the coordinator will end up with 1/5 of the load, while the other 2 nodes are each holding about 2/5 of the load.

The problem only occurs when l1cache is enabled, or I generate the data on the coordinator node.  It also only becomes a problem when I scale from 3-4 nodes.

Here is the original cache configuration for all 3 nodes :

<infinispan
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd"
      xmlns="urn:infinispan:config:5.2">

   <global>
      <transport clusterName="demoCluster"/>
      <globalJmxStatistics enabled="true"/>
   </global>

   <default>
      <jmxStatistics enabled="true"/>
      <clustering mode="distribution">
         <l1 enabled="true" lifespan="60000"/>
         <hash numOwners="2" rehashRpcTimeout="120000"/>
         <sync/>
      </clustering>
   </default>
</infinispan>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira