[infinispan-issues] [JBoss JIRA] (ISPN-2419) Infinispan node unable to join 4-node cluster under load - java.lang.IllegalStateException: channel is not connected
Martin Gencur (JIRA)
jira-events at lists.jboss.org
Thu Oct 18 10:45:01 EDT 2012
[ https://issues.jboss.org/browse/ISPN-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727736#comment-12727736 ]
Martin Gencur commented on ISPN-2419:
-------------------------------------
I'll recheck with JGroups 3.2.0.CR2 which should fix channel not connected exception. I tried with JGroups 3.2.0.CR1 and according to Bela it should be fixed in CR2.
> Infinispan node unable to join 4-node cluster under load - java.lang.IllegalStateException: channel is not connected
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-2419
> URL: https://issues.jboss.org/browse/ISPN-2419
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 5.2.0.Beta2
> Reporter: Martin Gencur
> Assignee: Mircea Markus
> Priority: Critical
>
> When 5. node joins an existing cluster of 4 nodes (under load), the following exception is thrown and Infinispan gets stuck:
> {code}
> 09:08:47,959 DEBUG [org.jgroups.protocols.pbcast.GMS] (pool-1-thread-1) exception=java.lang.IllegalStateException: channel is not connected, retrying
> java.lang.IllegalStateException: channel is not connected
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:621)
> at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:535)
> at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
> at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
> at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:604)
> at org.jgroups.JChannel.up(JChannel.java:715)
> at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
> at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
> at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
> at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
> at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
> at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:432)
> at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:754)
> at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:607)
> at org.jgroups.protocols.pbcast.NAKACK.flushBecomeServerQueue(NAKACK.java:898)
> at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:527)
> at org.jgroups.protocols.UNICAST2.down(UNICAST2.java:523)
> at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:307)
> at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:637)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.installView(ClientGmsImpl.java:248)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:182)
> at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:37)
> at org.jgroups.protocols.pbcast.GMS.down(GMS.java:938)
> at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
> at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
> at org.jgroups.protocols.FRAG2.down(FRAG2.java:147)
> at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
> at org.jgroups.JChannel.down(JChannel.java:729)
> at org.jgroups.JChannel.connect(JChannel.java:291)
> at org.jgroups.JChannel.connect(JChannel.java:262)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:206)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:197)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
> at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:879)
> at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650)
> at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639)
> at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542)
> at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:218)
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:680)
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652)
> at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:548)
> at org.radargun.cachewrappers.InfinispanWrapper.setUpCache(InfinispanWrapper.java:125)
> at org.radargun.cachewrappers.InfinispanWrapper.setUp(InfinispanWrapper.java:74)
> at org.radargun.stages.helpers.StartHelper.start(StartHelper.java:63)
> at org.radargun.stages.StartClusterStage.executeOnSlave(StartClusterStage.java:47)
> at org.radargun.Slave$2.run(Slave.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 09:08:48,116 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-1-thread-1) ISPN000079: Cache local address is edg-perf05-52618, physical addresses are [172.18.1.9:52000]
> 09:08:48,117 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-1-thread-1) Waiting on view being accepted
> 09:08:59,427 WARN [org.jgroups.protocols.pbcast.GMS] (Incoming-4,edg-perf05-52618) edg-perf05-52618: not member of view [edg-perf01-59608|5]; discarding it
> 09:10:47,645 DEBUG [org.jgroups.protocols.UNICAST2] (Timer-2,edg-perf05-52618) edg-perf05-52618: removed expired connection for edg-perf01-59608 (119835 ms old) from recv_table
> {code}
> I'm not sure whether this is a JGroups problem or Infinispan.
> The test scenario looks like this:
> * start 4 nodes in a cluster, put them under load
> * wait for some time and try to join 5. node
> * wait and join 6. node, wait and join 7. node, and also 8. one
> The test gets stuck on joining 5. node to the cluster.
> Respective Jenkins job with all logs/information: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/JDG-RADARGUN/job/ispn-52-radargun-elasticity-dist-04-08/12/
> The error above can be seen on edg-perf05 node.
> Infinispan's configuration:
> {code:xml}
> <infinispan
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd"
> xmlns="urn:infinispan:config:5.2">
> <global>
> <globalJmxStatistics
> enabled="true"
> jmxDomain="jboss.infinispan"
> cacheManagerName="default"/>
> <transport clusterName="default" distributedSyncTimeout="600000">
> <properties>
> <property name="configurationFile" value="jgroups-udp-custom.xml" />
> </properties>
> </transport>
> </global>
> <default>
> <transaction transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup" transactionMode="TRANSACTIONAL"/>
> <jmxStatistics enabled="true"/>
> <clustering mode="distribution">
> <l1 enabled="false" />
> <hash numOwners="2" numSegments="512" />
> <stateTransfer timeout="180000" />
> <sync replTimeout="60000"/>
> </clustering>
> <locking lockAcquisitionTimeout="3000" concurrencyLevel="1000" />
> </default>
>
> <namedCache name="testCache" />
> <namedCache name="memcachedCache" />
> </infinispan>
> {code}
> JGroups config:
> {code:xml}
> <config xmlns="urn:org:jgroups"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">
> <UDP
> bind_addr="${jgroups.udp.bind_addr:localhost}"
> bind_port="52000"
> port_range="200"
> mcast_addr="234.99.54.14"
> mcast_port="45688"
> tos="8"
> ucast_recv_buf_size="20000000"
> ucast_send_buf_size="640000"
> mcast_recv_buf_size="25000000"
> mcast_send_buf_size="640000"
> loopback="true"
> discard_incompatible_packets="true"
> max_bundle_size="64000"
> max_bundle_timeout="30"
> ip_ttl="${jgroups.udp.ip_ttl:2}"
> enable_bundling="true"
> enable_diagnostics="false"
> thread_naming_pattern="pl"
> thread_pool.enabled="true"
> thread_pool.min_threads="100"
> thread_pool.max_threads="200"
> thread_pool.keep_alive_time="60000"
> thread_pool.queue_enabled="false"
> thread_pool.queue_max_size="100"
> thread_pool.rejection_policy="Discard"
> oob_thread_pool.enabled="true"
> oob_thread_pool.min_threads="100"
> oob_thread_pool.max_threads="200"
> oob_thread_pool.keep_alive_time="60000"
> oob_thread_pool.queue_enabled="false"
> oob_thread_pool.queue_max_size="100"
> oob_thread_pool.rejection_policy="Discard"
> />
> <PING timeout="3000" num_initial_members="3"/>
> <MERGE2 max_interval="30000" min_interval="10000"/>
> <FD_SOCK/>
> <FD_ALL/>
> <BARRIER />
> <pbcast.NAKACK exponential_backoff="0"
> use_mcast_xmit="true"
> retransmit_timeout="300,600,1200"
> discard_delivered_msgs="true"/>
> <UNICAST2 timeout="300,600,1200"/>
> <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="1000000"/>
> <pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>
> <UFC max_credits="500000" min_threshold="0.20"/>
> <MFC max_credits="500000" min_threshold="0.20"/>
> <FRAG2 frag_size="60000" />
> </config>
> {code}
> If any additional information is needed, I'll provide them. I can also run the jenkins job with different configuration per request.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list