[JBoss JIRA] (ISPN-2612) Problem broadcasting CH_UPDATE command

Tuesday, 15 January 2013

     [
https://issues.jboss.org/browse/ISPN-2612?page=com.atlassian.jira.plugin....
]

Dan Berindei resolved ISPN-2612.
--------------------------------

    Resolution: Won't Fix

[~mlinhard], it looks like the problem is that you don't have enough threads in your
OOB thread pool, and the RSVP ACK message from the coordinator back to the self is
dropped. It is resent after UNICAST.stable_interval expires (5 seconds with your
configuration), but it is dropped again because 1 OOB thread is waiting for CH_UPDATE
responses and 1 thread is busy sending the RSVP ACK message (sending doesn't take
long, but JGroups doesn't wait at all before dropping the message).

I must have gotten lucky when I ran the test last night, because I tried it again after
looking at the logs and I got the same TimeoutException you did. It went away when I
increased oob_thread_pool.max_threads to 4.

I'm not sure what the absolute minimum number of OOB threads is, but I recommend
running with the defaults in jgroups-udp.xml. JGroups won't start more threads than it
needs, so I don't see a reason to move from the defaults.

Please reopen the JIRA if you can still reproduce it with 4 or more OOB threads.

...
 Problem broadcasting CH_UPDATE command
 --------------------------------------

                 Key: ISPN-2612
                 URL: https://issues.jboss.org/browse/ISPN-2612
             Project: Infinispan
          Issue Type: Bug
          Components: RPC
    Affects Versions: 5.2.0.Beta5
            Reporter: Michal Linhard
            Assignee: Dan Berindei
            Priority: Blocker
             Fix For: 5.2.0.CR2

         Attachments: session-cluster.xml, test.zip, test_TRACE.zip

 Infinispan 5.2.0.Beta5
 JGroups 3.2.4.Final
 Steps to reproduce (I'm using two virtual interfaces test1, test2)
 1. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test1
-Djava.net.preferIPv4Stack=true
 2. wait 10 sec
 3. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test2
-Djava.net.preferIPv4Stack=true
 After 5 seconds there should be this timeout exception:
 {code}
 19:42:14,146 WARN  [org.infinispan.topology.CacheTopologyControlCommand]
(OOB-2,mlinhard-work-37329) ISPN000071: Caught exception when handling command
CacheTopologyControlCommand{cache=___defaultcache, type=REBALANCE_CONFIRM,
sender=mlinhard-work-47337, joinInfo=null, topologyId=1, currentCH=null, pendingCH=null,
throwable=null, viewId=1}
 java.util.concurrent.ExecutionException: org.infinispan.CacheException:
org.jgroups.TimeoutException: TimeoutException
 	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
 	at java.util.concurrent.FutureTask.get(FutureTask.java:91)
 	at
org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:563)
 	at
org.infinispan.topology.ClusterTopologyManagerImpl.broadcastConsistentHashUpdate(ClusterTopologyManagerImpl.java:349)
 	at
org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:213)
 	at
org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
 	at
org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
 	at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:252)
 	at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:219)
 	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483)
 	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
 	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
 	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
 	at org.jgroups.JChannel.up(JChannel.java:703)
 	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
 	at org.jgroups.protocols.RSVP.up(RSVP.java:172)
 	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
 	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
 	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
 	at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:736)
 	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:414)
 	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
 	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
 	at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:187)
 	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
 	at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
 	at org.jgroups.protocols.Discovery.up(Discovery.java:359)
 	at org.jgroups.protocols.TP.passMessageUp(TP.java:1287)
 	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1850)
 	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1823)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 	at java.lang.Thread.run(Thread.java:662)
 Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException
 	at org.infinispan.util.Util.rewrapAsCacheException(Util.java:532)
 	at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:152)
 	at
org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:518)
 	at
org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:545)
 	at
org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:542)
 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 	... 3 more
 Caused by: org.jgroups.TimeoutException: TimeoutException
 	at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)
 	at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)
 	at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)
 	at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)
 	at org.jgroups.protocols.RSVP.down(RSVP.java:118)
 	at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
 	at org.jgroups.JChannel.down(JChannel.java:718)
 	at
org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616)
 	at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:173)
 	at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360)
 	at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103)
 	at org.jgroups.blocks.Request.execute(Request.java:83)
 	at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:335)
 	at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249)
 	at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:330)
 	at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:145)
 	... 8 more
 {code}
 Analysis:
 These are the messages sent after view change:
 {code}
 test1                     test2
  <--- JOIN ---- 
  ---- REBALANCE_START ---> 
  <--- StateRequestCommand ---- 
  ---- StateResponseCommand ---> 
  <--- REBALANCE_CONFIRM ---- 
  ---- CH_UPDATE --->
 {code}
 The last CH_UPDATE message is broadcast, test2 successfully processes it, but test1 stays
in waiting state, because it for some reason awaits response also from itself - local
variable entry in the method RSVP.down

(https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/...)
 contained local address. 
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009