[
https://issues.jboss.org/browse/ISPN-2612?page=com.atlassian.jira.plugin....
]
Dan Berindei resolved ISPN-2612.
--------------------------------
Resolution: Won't Fix
[~mlinhard], it looks like the problem is that you don't have enough threads in your
OOB thread pool, and the RSVP ACK message from the coordinator back to the self is
dropped. It is resent after UNICAST.stable_interval expires (5 seconds with your
configuration), but it is dropped again because 1 OOB thread is waiting for CH_UPDATE
responses and 1 thread is busy sending the RSVP ACK message (sending doesn't take
long, but JGroups doesn't wait at all before dropping the message).
I must have gotten lucky when I ran the test last night, because I tried it again after
looking at the logs and I got the same TimeoutException you did. It went away when I
increased oob_thread_pool.max_threads to 4.
I'm not sure what the absolute minimum number of OOB threads is, but I recommend
running with the defaults in jgroups-udp.xml. JGroups won't start more threads than it
needs, so I don't see a reason to move from the defaults.
Please reopen the JIRA if you can still reproduce it with 4 or more OOB threads.
Problem broadcasting CH_UPDATE command
--------------------------------------
Key: ISPN-2612
URL:
https://issues.jboss.org/browse/ISPN-2612
Project: Infinispan
Issue Type: Bug
Components: RPC
Affects Versions: 5.2.0.Beta5
Reporter: Michal Linhard
Assignee: Dan Berindei
Priority: Blocker
Fix For: 5.2.0.CR2
Attachments: session-cluster.xml, test.zip, test_TRACE.zip
Infinispan 5.2.0.Beta5
JGroups 3.2.4.Final
Steps to reproduce (I'm using two virtual interfaces test1, test2)
1. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test1
-Djava.net.preferIPv4Stack=true
2. wait 10 sec
3. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test2
-Djava.net.preferIPv4Stack=true
After 5 seconds there should be this timeout exception:
{code}
19:42:14,146 WARN [org.infinispan.topology.CacheTopologyControlCommand]
(OOB-2,mlinhard-work-37329) ISPN000071: Caught exception when handling command
CacheTopologyControlCommand{cache=___defaultcache, type=REBALANCE_CONFIRM,
sender=mlinhard-work-47337, joinInfo=null, topologyId=1, currentCH=null, pendingCH=null,
throwable=null, viewId=1}
java.util.concurrent.ExecutionException: org.infinispan.CacheException:
org.jgroups.TimeoutException: TimeoutException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at
org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:563)
at
org.infinispan.topology.ClusterTopologyManagerImpl.broadcastConsistentHashUpdate(ClusterTopologyManagerImpl.java:349)
at
org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:213)
at
org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
at
org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:252)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:219)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
at org.jgroups.JChannel.up(JChannel.java:703)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
at org.jgroups.protocols.RSVP.up(RSVP.java:172)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:736)
at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:414)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:187)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
at org.jgroups.protocols.Discovery.up(Discovery.java:359)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1287)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1850)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1823)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
at org.infinispan.util.Util.rewrapAsCacheException(Util.java:532)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:152)
at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:518)
at
org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:545)
at
org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:542)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
... 3 more
Caused by: org.jgroups.TimeoutException: TimeoutException
at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)
at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)
at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)
at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)
at org.jgroups.protocols.RSVP.down(RSVP.java:118)
at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
at org.jgroups.JChannel.down(JChannel.java:718)
at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616)
at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:173)
at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360)
at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103)
at org.jgroups.blocks.Request.execute(Request.java:83)
at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:335)
at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:330)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:145)
... 8 more
{code}
Analysis:
These are the messages sent after view change:
{code}
test1 test2
<--- JOIN ----
---- REBALANCE_START --->
<--- StateRequestCommand ----
---- StateResponseCommand --->
<--- REBALANCE_CONFIRM ----
---- CH_UPDATE --->
{code}
The last CH_UPDATE message is broadcast, test2 successfully processes it, but test1 stays
in waiting state, because it for some reason awaits response also from itself - local
variable entry in the method RSVP.down
(
https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...)
contained local address.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira