[infinispan-issues] [JBoss JIRA] (ISPN-2419) Infinispan node unable to join 4-node cluster under load - java.lang.IllegalStateException: channel is not connected
Martin Gencur (JIRA)
jira-events at lists.jboss.org
Thu Oct 18 10:19:01 EDT 2012
Martin Gencur created ISPN-2419:
-----------------------------------
Summary: Infinispan node unable to join 4-node cluster under load - java.lang.IllegalStateException: channel is not connected
Key: ISPN-2419
URL: https://issues.jboss.org/browse/ISPN-2419
Project: Infinispan
Issue Type: Bug
Affects Versions: 5.2.0.Beta2
Reporter: Martin Gencur
Assignee: Mircea Markus
Priority: Critical
When 5. node joins an existing cluster of 4 nodes (under load), the following exception is thrown and Infinispan gets stuck:
{code}
09:08:47,959 DEBUG [org.jgroups.protocols.pbcast.GMS] (pool-1-thread-1) exception=java.lang.IllegalStateException: channel is not connected, retrying
java.lang.IllegalStateException: channel is not connected
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:621)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:535)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:604)
at org.jgroups.JChannel.up(JChannel.java:715)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:432)
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:754)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:607)
at org.jgroups.protocols.pbcast.NAKACK.flushBecomeServerQueue(NAKACK.java:898)
at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:527)
at org.jgroups.protocols.UNICAST2.down(UNICAST2.java:523)
at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:307)
at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:637)
at org.jgroups.protocols.pbcast.ClientGmsImpl.installView(ClientGmsImpl.java:248)
at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:182)
at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:37)
at org.jgroups.protocols.pbcast.GMS.down(GMS.java:938)
at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
at org.jgroups.protocols.FRAG2.down(FRAG2.java:147)
at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
at org.jgroups.JChannel.down(JChannel.java:729)
at org.jgroups.JChannel.connect(JChannel.java:291)
at org.jgroups.JChannel.connect(JChannel.java:262)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:206)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:879)
at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650)
at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639)
at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542)
at org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:218)
at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:680)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:548)
at org.radargun.cachewrappers.InfinispanWrapper.setUpCache(InfinispanWrapper.java:125)
at org.radargun.cachewrappers.InfinispanWrapper.setUp(InfinispanWrapper.java:74)
at org.radargun.stages.helpers.StartHelper.start(StartHelper.java:63)
at org.radargun.stages.StartClusterStage.executeOnSlave(StartClusterStage.java:47)
at org.radargun.Slave$2.run(Slave.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
09:08:48,116 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-1-thread-1) ISPN000079: Cache local address is edg-perf05-52618, physical addresses are [172.18.1.9:52000]
09:08:48,117 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-1-thread-1) Waiting on view being accepted
09:08:59,427 WARN [org.jgroups.protocols.pbcast.GMS] (Incoming-4,edg-perf05-52618) edg-perf05-52618: not member of view [edg-perf01-59608|5]; discarding it
09:10:47,645 DEBUG [org.jgroups.protocols.UNICAST2] (Timer-2,edg-perf05-52618) edg-perf05-52618: removed expired connection for edg-perf01-59608 (119835 ms old) from recv_table
{code}
I'm not sure whether this is a JGroups problem or Infinispan.
The test scenario looks like this:
* start 4 nodes in a cluster, put them under load
* wait for some time and try to join 5. node
* wait and join 6. node, wait and join 7. node, and also 8. one
The test gets stuck on joining 5. node to the cluster.
Respective Jenkins job with all logs/information: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/JDG-RADARGUN/job/ispn-52-radargun-elasticity-dist-04-08/12/
The error above can be seen on edg-perf05 node.
Infinispan's configuration:
{code:xml}
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd"
xmlns="urn:infinispan:config:5.2">
<global>
<globalJmxStatistics
enabled="true"
jmxDomain="jboss.infinispan"
cacheManagerName="default"/>
<transport clusterName="default" distributedSyncTimeout="600000">
<properties>
<property name="configurationFile" value="jgroups-udp-custom.xml" />
</properties>
</transport>
</global>
<default>
<transaction transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup" transactionMode="TRANSACTIONAL"/>
<jmxStatistics enabled="true"/>
<clustering mode="distribution">
<l1 enabled="false" />
<hash numOwners="2" numSegments="512" />
<stateTransfer timeout="180000" />
<sync replTimeout="60000"/>
</clustering>
<locking lockAcquisitionTimeout="3000" concurrencyLevel="1000" />
</default>
<namedCache name="testCache" />
<namedCache name="memcachedCache" />
</infinispan>
{code}
JGroups config:
{code:xml}
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">
<UDP
bind_addr="${jgroups.udp.bind_addr:localhost}"
bind_port="52000"
port_range="200"
mcast_addr="234.99.54.14"
mcast_port="45688"
tos="8"
ucast_recv_buf_size="20000000"
ucast_send_buf_size="640000"
mcast_recv_buf_size="25000000"
mcast_send_buf_size="640000"
loopback="true"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
ip_ttl="${jgroups.udp.ip_ttl:2}"
enable_bundling="true"
enable_diagnostics="false"
thread_naming_pattern="pl"
thread_pool.enabled="true"
thread_pool.min_threads="100"
thread_pool.max_threads="200"
thread_pool.keep_alive_time="60000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="Discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="100"
oob_thread_pool.max_threads="200"
oob_thread_pool.keep_alive_time="60000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="Discard"
/>
<PING timeout="3000" num_initial_members="3"/>
<MERGE2 max_interval="30000" min_interval="10000"/>
<FD_SOCK/>
<FD_ALL/>
<BARRIER />
<pbcast.NAKACK exponential_backoff="0"
use_mcast_xmit="true"
retransmit_timeout="300,600,1200"
discard_delivered_msgs="true"/>
<UNICAST2 timeout="300,600,1200"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="1000000"/>
<pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>
<UFC max_credits="500000" min_threshold="0.20"/>
<MFC max_credits="500000" min_threshold="0.20"/>
<FRAG2 frag_size="60000" />
</config>
{code}
If any additional information is needed, I'll provide them. I can also run the jenkins job with different configuration per request.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the infinispan-issues
mailing list