[infinispan-issues] [JBoss JIRA] (ISPN-2419) Infinispan node unable to join 4-node cluster under load - java.lang.IllegalStateException: channel is not connected

Martin Gencur (JIRA) jira-events at lists.jboss.org
Thu Oct 18 10:19:01 EDT 2012


Martin Gencur created ISPN-2419:
-----------------------------------

             Summary: Infinispan node unable to join 4-node cluster under load - java.lang.IllegalStateException: channel is not connected 
                 Key: ISPN-2419
                 URL: https://issues.jboss.org/browse/ISPN-2419
             Project: Infinispan
          Issue Type: Bug
    Affects Versions: 5.2.0.Beta2
            Reporter: Martin Gencur
            Assignee: Mircea Markus
            Priority: Critical


When 5. node joins an existing cluster of 4 nodes (under load), the following exception is thrown and Infinispan gets stuck:

{code}
09:08:47,959 DEBUG [org.jgroups.protocols.pbcast.GMS] (pool-1-thread-1) exception=java.lang.IllegalStateException: channel is not connected, retrying
java.lang.IllegalStateException: channel is not connected
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:621)
	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:535)
	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:604)
	at org.jgroups.JChannel.up(JChannel.java:715)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:432)
	at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:754)
	at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:607)
	at org.jgroups.protocols.pbcast.NAKACK.flushBecomeServerQueue(NAKACK.java:898)
	at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:527)
	at org.jgroups.protocols.UNICAST2.down(UNICAST2.java:523)
	at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:307)
	at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:637)
	at org.jgroups.protocols.pbcast.ClientGmsImpl.installView(ClientGmsImpl.java:248)
	at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:182)
	at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:37)
	at org.jgroups.protocols.pbcast.GMS.down(GMS.java:938)
	at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
	at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
	at org.jgroups.protocols.FRAG2.down(FRAG2.java:147)
	at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
	at org.jgroups.JChannel.down(JChannel.java:729)
	at org.jgroups.JChannel.connect(JChannel.java:291)
	at org.jgroups.JChannel.connect(JChannel.java:262)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:206)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:197)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
	at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:879)
	at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650)
	at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639)
	at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542)
	at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:218)
	at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:680)
	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652)
	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:548)
	at org.radargun.cachewrappers.InfinispanWrapper.setUpCache(InfinispanWrapper.java:125)
	at org.radargun.cachewrappers.InfinispanWrapper.setUp(InfinispanWrapper.java:74)
	at org.radargun.stages.helpers.StartHelper.start(StartHelper.java:63)
	at org.radargun.stages.StartClusterStage.executeOnSlave(StartClusterStage.java:47)
	at org.radargun.Slave$2.run(Slave.java:103)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
09:08:48,116 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-1-thread-1) ISPN000079: Cache local address is edg-perf05-52618, physical addresses are [172.18.1.9:52000]
09:08:48,117 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-1-thread-1) Waiting on view being accepted
09:08:59,427 WARN  [org.jgroups.protocols.pbcast.GMS] (Incoming-4,edg-perf05-52618) edg-perf05-52618: not member of view [edg-perf01-59608|5]; discarding it
09:10:47,645 DEBUG [org.jgroups.protocols.UNICAST2] (Timer-2,edg-perf05-52618) edg-perf05-52618: removed expired connection for edg-perf01-59608 (119835 ms old) from recv_table
{code}

I'm not sure whether this is a JGroups problem or Infinispan.

The test scenario looks like this:
* start 4 nodes in a cluster, put them under load
* wait for some time and try to join 5. node
* wait and join 6. node, wait and join 7. node, and also 8. one

The test gets stuck on joining 5. node to the cluster.

Respective Jenkins job with all logs/information: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/JDG-RADARGUN/job/ispn-52-radargun-elasticity-dist-04-08/12/

The error above can be seen on edg-perf05 node.

Infinispan's configuration:
{code:xml}
<infinispan
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd"
      xmlns="urn:infinispan:config:5.2">

   <global>
      <globalJmxStatistics
            enabled="true"
            jmxDomain="jboss.infinispan" 
            cacheManagerName="default"/>
      <transport clusterName="default" distributedSyncTimeout="600000">
         <properties>
            <property name="configurationFile" value="jgroups-udp-custom.xml" />
         </properties>
      </transport>
   </global>

   <default>
      <transaction transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup" transactionMode="TRANSACTIONAL"/>
      <jmxStatistics enabled="true"/>

      <clustering mode="distribution">
         <l1 enabled="false" />
         <hash numOwners="2" numSegments="512" />
         <stateTransfer timeout="180000" />
         <sync replTimeout="60000"/>
      </clustering>
      <locking lockAcquisitionTimeout="3000" concurrencyLevel="1000" />
   </default>
   
   <namedCache name="testCache" />
   <namedCache name="memcachedCache" />

</infinispan>
{code}

JGroups config:
{code:xml}
<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">
   <UDP
         bind_addr="${jgroups.udp.bind_addr:localhost}"
         bind_port="52000"
         port_range="200"
         mcast_addr="234.99.54.14"
         mcast_port="45688"
         tos="8"
         ucast_recv_buf_size="20000000"
         ucast_send_buf_size="640000"
         mcast_recv_buf_size="25000000"
         mcast_send_buf_size="640000"
         loopback="true"
         discard_incompatible_packets="true"
         max_bundle_size="64000"
         max_bundle_timeout="30"
         ip_ttl="${jgroups.udp.ip_ttl:2}"
         enable_bundling="true"
         enable_diagnostics="false"

         thread_naming_pattern="pl"

         thread_pool.enabled="true"
         thread_pool.min_threads="100"
         thread_pool.max_threads="200"
         thread_pool.keep_alive_time="60000"
         thread_pool.queue_enabled="false"
         thread_pool.queue_max_size="100"
         thread_pool.rejection_policy="Discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="100"
         oob_thread_pool.max_threads="200"
         oob_thread_pool.keep_alive_time="60000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="Discard"
         />

   <PING timeout="3000" num_initial_members="3"/>
   <MERGE2 max_interval="30000" min_interval="10000"/>
   <FD_SOCK/>
   <FD_ALL/>
   <BARRIER />
   <pbcast.NAKACK  exponential_backoff="0"
                   use_mcast_xmit="true"
                   retransmit_timeout="300,600,1200"
                   discard_delivered_msgs="true"/>
   <UNICAST2 timeout="300,600,1200"/>
   <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="1000000"/>
   <pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>
   <UFC max_credits="500000" min_threshold="0.20"/>
   <MFC max_credits="500000" min_threshold="0.20"/>
   <FRAG2 frag_size="60000"  />
</config>
{code}

If any additional information is needed, I'll provide them. I can also run the jenkins job with different configuration per request.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the infinispan-issues mailing list