]
Martin Gencur commented on ISPN-2419:
-------------------------------------
I'll recheck with JGroups 3.2.0.CR2 which should fix channel not connected exception.
I tried with JGroups 3.2.0.CR1 and according to Bela it should be fixed in CR2.
Infinispan node unable to join 4-node cluster under load -
java.lang.IllegalStateException: channel is not connected
---------------------------------------------------------------------------------------------------------------------
Key: ISPN-2419
URL:
https://issues.jboss.org/browse/ISPN-2419
Project: Infinispan
Issue Type: Bug
Affects Versions: 5.2.0.Beta2
Reporter: Martin Gencur
Assignee: Mircea Markus
Priority: Critical
When 5. node joins an existing cluster of 4 nodes (under load), the following exception
is thrown and Infinispan gets stuck:
{code}
09:08:47,959 DEBUG [org.jgroups.protocols.pbcast.GMS] (pool-1-thread-1)
exception=java.lang.IllegalStateException: channel is not connected, retrying
java.lang.IllegalStateException: channel is not connected
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:621)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:535)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:604)
at org.jgroups.JChannel.up(JChannel.java:715)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:432)
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:754)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:607)
at org.jgroups.protocols.pbcast.NAKACK.flushBecomeServerQueue(NAKACK.java:898)
at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:527)
at org.jgroups.protocols.UNICAST2.down(UNICAST2.java:523)
at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:307)
at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:637)
at org.jgroups.protocols.pbcast.ClientGmsImpl.installView(ClientGmsImpl.java:248)
at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:182)
at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:37)
at org.jgroups.protocols.pbcast.GMS.down(GMS.java:938)
at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
at org.jgroups.protocols.FlowControl.down(FlowControl.java:351)
at org.jgroups.protocols.FRAG2.down(FRAG2.java:147)
at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
at org.jgroups.JChannel.down(JChannel.java:729)
at org.jgroups.JChannel.connect(JChannel.java:291)
at org.jgroups.JChannel.connect(JChannel.java:262)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:206)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
at
org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:879)
at
org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650)
at
org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639)
at
org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542)
at
org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:218)
at
org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:680)
at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652)
at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:548)
at org.radargun.cachewrappers.InfinispanWrapper.setUpCache(InfinispanWrapper.java:125)
at org.radargun.cachewrappers.InfinispanWrapper.setUp(InfinispanWrapper.java:74)
at org.radargun.stages.helpers.StartHelper.start(StartHelper.java:63)
at org.radargun.stages.StartClusterStage.executeOnSlave(StartClusterStage.java:47)
at org.radargun.Slave$2.run(Slave.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
09:08:48,116 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(pool-1-thread-1) ISPN000079: Cache local address is edg-perf05-52618, physical addresses
are [172.18.1.9:52000]
09:08:48,117 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport]
(pool-1-thread-1) Waiting on view being accepted
09:08:59,427 WARN [org.jgroups.protocols.pbcast.GMS] (Incoming-4,edg-perf05-52618)
edg-perf05-52618: not member of view [edg-perf01-59608|5]; discarding it
09:10:47,645 DEBUG [org.jgroups.protocols.UNICAST2] (Timer-2,edg-perf05-52618)
edg-perf05-52618: removed expired connection for edg-perf01-59608 (119835 ms old) from
recv_table
{code}
I'm not sure whether this is a JGroups problem or Infinispan.
The test scenario looks like this:
* start 4 nodes in a cluster, put them under load
* wait for some time and try to join 5. node
* wait and join 6. node, wait and join 7. node, and also 8. one
The test gets stuck on joining 5. node to the cluster.
Respective Jenkins job with all logs/information:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/JDG-RADAR...
The error above can be seen on edg-perf05 node.
Infinispan's configuration:
{code:xml}
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.2
http://www.infinispan.org/schemas/infinispan-config-5.2.xsd"
xmlns="urn:infinispan:config:5.2">
<global>
<globalJmxStatistics
enabled="true"
jmxDomain="jboss.infinispan"
cacheManagerName="default"/>
<transport clusterName="default"
distributedSyncTimeout="600000">
<properties>
<property name="configurationFile"
value="jgroups-udp-custom.xml" />
</properties>
</transport>
</global>
<default>
<transaction
transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"
transactionMode="TRANSACTIONAL"/>
<jmxStatistics enabled="true"/>
<clustering mode="distribution">
<l1 enabled="false" />
<hash numOwners="2" numSegments="512" />
<stateTransfer timeout="180000" />
<sync replTimeout="60000"/>
</clustering>
<locking lockAcquisitionTimeout="3000"
concurrencyLevel="1000" />
</default>
<namedCache name="testCache" />
<namedCache name="memcachedCache" />
</infinispan>
{code}
JGroups config:
{code:xml}
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups file:schema/JGroups-2.8.xsd">
<UDP
bind_addr="${jgroups.udp.bind_addr:localhost}"
bind_port="52000"
port_range="200"
mcast_addr="234.99.54.14"
mcast_port="45688"
tos="8"
ucast_recv_buf_size="20000000"
ucast_send_buf_size="640000"
mcast_recv_buf_size="25000000"
mcast_send_buf_size="640000"
loopback="true"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
ip_ttl="${jgroups.udp.ip_ttl:2}"
enable_bundling="true"
enable_diagnostics="false"
thread_naming_pattern="pl"
thread_pool.enabled="true"
thread_pool.min_threads="100"
thread_pool.max_threads="200"
thread_pool.keep_alive_time="60000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="Discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="100"
oob_thread_pool.max_threads="200"
oob_thread_pool.keep_alive_time="60000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="Discard"
/>
<PING timeout="3000" num_initial_members="3"/>
<MERGE2 max_interval="30000" min_interval="10000"/>
<FD_SOCK/>
<FD_ALL/>
<BARRIER />
<pbcast.NAKACK exponential_backoff="0"
use_mcast_xmit="true"
retransmit_timeout="300,600,1200"
discard_delivered_msgs="true"/>
<UNICAST2 timeout="300,600,1200"/>
<pbcast.STABLE stability_delay="1000"
desired_avg_gossip="50000" max_bytes="1000000"/>
<pbcast.GMS print_local_addr="false" join_timeout="3000"
view_bundling="true"/>
<UFC max_credits="500000" min_threshold="0.20"/>
<MFC max_credits="500000" min_threshold="0.20"/>
<FRAG2 frag_size="60000" />
</config>
{code}
If any additional information is needed, I'll provide them. I can also run the
jenkins job with different configuration per request.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: