[jboss-jira] [JBoss JIRA] (JGRP-1823) MessageDispatcher expects a response from the local node with DONT_LOOPBACK
Bela Ban (JIRA)
issues at jboss.org
Tue Apr 8 11:00:13 EDT 2014
[ https://issues.jboss.org/browse/JGRP-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12960203#comment-12960203 ]
Bela Ban edited comment on JGRP-1823 at 4/8/14 10:58 AM:
---------------------------------------------------------
Email by Dan:
I think I ran into a different problem with Beta4. After happily delivering DONT_LOOPBACK messages only to the rest of the cluster, NodeA suddenly decides to deliver all those DONT_LOOPBACK messages it "missed".
This is a request that is broadcasted to the entire cluster, but is (correctly) received only by NodeB and NodeC:
{noformat}
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] DataRehashedEventTest-NodeA-1707: sending msg to null, src=DataRehashedEventTest-NodeA-1707, headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] dest=127.0.0.1:7902 (5069 bytes)
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] dest=127.0.0.1:7901 (5069 bytes)
14:49:10,347 TRACE (OOB-2,DataRehashedEventTest-NodeB-6988:) [TCP] DataRehashedEventTest-NodeB-6988: received [dst: <null>, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=5013 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|DONT_LOOPBACK], headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,347 TRACE (OOB-2,DataRehashedEventTest-NodeB-6988:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
14:49:10,360 TRACE (OOB-2,DataRehashedEventTest-NodeC-9852:) [TCP] DataRehashedEventTest-NodeC-9852: received [dst: <null>, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=5013 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|DONT_LOOPBACK], headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,362 TRACE (OOB-2,DataRehashedEventTest-NodeC-9852:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
{noformat}
Now comes the view change on NodeA:
{noformat}
14:49:10,390 TRACE (ViewHandler,DataRehashedEventTest-NodeA-1707:) [GMS] DataRehashedEventTest-NodeA-1707: joiners=[], suspected=[], leaving=[DataRehashedEventTest-NodeC-9852], new view: [DataRehashedEventTest-NodeA-1707|3] (2) [DataRehashedEventTest-NodeA-1707, DataRehashedEventTest-NodeB-6988]
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [TCP] DataRehashedEventTest-NodeA-1707: sending msg to DataRehashedEventTest-NodeA-1707, src=DataRehashedEventTest-NodeA-1707, headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=3, TCP: [cluster_name=ISPN]
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [TCP] DataRehashedEventTest-NodeA-1707: looping back message [dst: DataRehashedEventTest-NodeA-1707, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=0 bytes, flags=OOB|INTERNAL]
{noformat}
After NodeA sends the VIEW_ACK to itself, the same Incoming thread starts replaying previous messages:
{noformat}
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 10
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 18
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 20
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
{noformat}
I've attached the entire log here, in case I've missed something.
was (Author: belaban):
Email by Dan:
I think I ran into a different problem with Beta4. After happily delivering DONT_LOOPBACK messages only to the rest of the cluster, NodeA suddenly decides to deliver all those DONT_LOOPBACK messages it "missed".
This is a request that is broadcasted to the entire cluster, but is (correctly) received only by NodeB and NodeC:
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] DataRehashedEventTest-NodeA-1707: sending msg to null, src=DataRehashedEventTest-NodeA-1707, headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] dest=127.0.0.1:7902 (5069 bytes)
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] dest=127.0.0.1:7901 (5069 bytes)
14:49:10,347 TRACE (OOB-2,DataRehashedEventTest-NodeB-6988:) [TCP] DataRehashedEventTest-NodeB-6988: received [dst: <null>, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=5013 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|DONT_LOOPBACK], headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,347 TRACE (OOB-2,DataRehashedEventTest-NodeB-6988:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
14:49:10,360 TRACE (OOB-2,DataRehashedEventTest-NodeC-9852:) [TCP] DataRehashedEventTest-NodeC-9852: received [dst: <null>, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=5013 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|DONT_LOOPBACK], headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,362 TRACE (OOB-2,DataRehashedEventTest-NodeC-9852:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
Now comes the view change on NodeA:
14:49:10,390 TRACE (ViewHandler,DataRehashedEventTest-NodeA-1707:) [GMS] DataRehashedEventTest-NodeA-1707: joiners=[], suspected=[], leaving=[DataRehashedEventTest-NodeC-9852], new view: [DataRehashedEventTest-NodeA-1707|3] (2) [DataRehashedEventTest-NodeA-1707, DataRehashedEventTest-NodeB-6988]
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [TCP] DataRehashedEventTest-NodeA-1707: sending msg to DataRehashedEventTest-NodeA-1707, src=DataRehashedEventTest-NodeA-1707, headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=3, TCP: [cluster_name=ISPN]
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [TCP] DataRehashedEventTest-NodeA-1707: looping back message [dst: DataRehashedEventTest-NodeA-1707, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=0 bytes, flags=OOB|INTERNAL]
After NodeA sends the VIEW_ACK to itself, the same Incoming thread starts replaying previous messages:
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 10
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 18
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 20
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
I've attached the entire log here, in case I've missed something.
> MessageDispatcher expects a response from the local node with DONT_LOOPBACK
> ---------------------------------------------------------------------------
>
> Key: JGRP-1823
> URL: https://issues.jboss.org/browse/JGRP-1823
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.5
> Reporter: Dan Berindei
> Assignee: Bela Ban
> Fix For: 3.5
>
>
> MessageDispatcher doesn't remove the local node from the `real_dests` collection when using DONT_LOOPBACK, and that causes RPCs broadcasted to the entire cluster to time out.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list