[jboss-jira] [JBoss JIRA] (JGRP-1823) MessageDispatcher expects a response from the local node with DONT_LOOPBACK

Bela Ban (JIRA) issues at jboss.org
Tue Apr 8 11:00:13 EDT 2014


    [ https://issues.jboss.org/browse/JGRP-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12960203#comment-12960203 ] 

Bela Ban edited comment on JGRP-1823 at 4/8/14 10:58 AM:
---------------------------------------------------------

Email by Dan:

I think I ran into a different problem with Beta4. After happily delivering DONT_LOOPBACK messages only to the rest of the cluster, NodeA suddenly decides to deliver all those DONT_LOOPBACK messages it "missed".

This is a request that is broadcasted to the entire cluster, but is (correctly) received only by NodeB and NodeC:

{noformat}
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] DataRehashedEventTest-NodeA-1707: sending msg to null, src=DataRehashedEventTest-NodeA-1707, headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] dest=127.0.0.1:7902 (5069 bytes)
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] dest=127.0.0.1:7901 (5069 bytes)
14:49:10,347 TRACE (OOB-2,DataRehashedEventTest-NodeB-6988:) [TCP] DataRehashedEventTest-NodeB-6988: received [dst: <null>, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=5013 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|DONT_LOOPBACK], headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,347 TRACE (OOB-2,DataRehashedEventTest-NodeB-6988:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
14:49:10,360 TRACE (OOB-2,DataRehashedEventTest-NodeC-9852:) [TCP] DataRehashedEventTest-NodeC-9852: received [dst: <null>, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=5013 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|DONT_LOOPBACK], headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,362 TRACE (OOB-2,DataRehashedEventTest-NodeC-9852:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
{noformat}

Now comes the view change on NodeA:

{noformat}
14:49:10,390 TRACE (ViewHandler,DataRehashedEventTest-NodeA-1707:) [GMS] DataRehashedEventTest-NodeA-1707: joiners=[], suspected=[], leaving=[DataRehashedEventTest-NodeC-9852], new view: [DataRehashedEventTest-NodeA-1707|3] (2) [DataRehashedEventTest-NodeA-1707, DataRehashedEventTest-NodeB-6988]
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [TCP] DataRehashedEventTest-NodeA-1707: sending msg to DataRehashedEventTest-NodeA-1707, src=DataRehashedEventTest-NodeA-1707, headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=3, TCP: [cluster_name=ISPN]
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [TCP] DataRehashedEventTest-NodeA-1707: looping back message [dst: DataRehashedEventTest-NodeA-1707, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=0 bytes, flags=OOB|INTERNAL]
{noformat}

After NodeA sends the VIEW_ACK to itself, the same Incoming thread starts replaying previous messages:

{noformat}
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 10
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 18
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 20
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
{noformat}

I've attached the entire log here, in case I've missed something.

                
      was (Author: belaban):
    Email by Dan:

I think I ran into a different problem with Beta4. After happily delivering DONT_LOOPBACK messages only to the rest of the cluster, NodeA suddenly decides to deliver all those DONT_LOOPBACK messages it "missed".

This is a request that is broadcasted to the entire cluster, but is (correctly) received only by NodeB and NodeC:

14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] DataRehashedEventTest-NodeA-1707: sending msg to null, src=DataRehashedEventTest-NodeA-1707, headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] dest=127.0.0.1:7902 (5069 bytes)
14:49:10,323 TRACE (transport-thread-DataRehashedEventTest-NodeA-p2-t3:) [TCP] dest=127.0.0.1:7901 (5069 bytes)
14:49:10,347 TRACE (OOB-2,DataRehashedEventTest-NodeB-6988:) [TCP] DataRehashedEventTest-NodeB-6988: received [dst: <null>, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=5013 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|DONT_LOOPBACK], headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,347 TRACE (OOB-2,DataRehashedEventTest-NodeB-6988:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21
14:49:10,360 TRACE (OOB-2,DataRehashedEventTest-NodeC-9852:) [TCP] DataRehashedEventTest-NodeC-9852: received [dst: <null>, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=5013 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|DONT_LOOPBACK], headers are RequestCorrelator: id=200, type=REQ, id=21, rsp_expected=false, NAKACK2: [MSG, seqno=6], TCP: [cluster_name=ISPN]
14:49:10,362 TRACE (OOB-2,DataRehashedEventTest-NodeC-9852:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21

Now comes the view change on NodeA:

14:49:10,390 TRACE (ViewHandler,DataRehashedEventTest-NodeA-1707:) [GMS] DataRehashedEventTest-NodeA-1707: joiners=[], suspected=[], leaving=[DataRehashedEventTest-NodeC-9852], new view: [DataRehashedEventTest-NodeA-1707|3] (2) [DataRehashedEventTest-NodeA-1707, DataRehashedEventTest-NodeB-6988]
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [TCP] DataRehashedEventTest-NodeA-1707: sending msg to DataRehashedEventTest-NodeA-1707, src=DataRehashedEventTest-NodeA-1707, headers are GMS: GmsHeader[VIEW_ACK], UNICAST3: DATA, seqno=3, TCP: [cluster_name=ISPN]
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [TCP] DataRehashedEventTest-NodeA-1707: looping back message [dst: DataRehashedEventTest-NodeA-1707, src: DataRehashedEventTest-NodeA-1707 (3 headers), size=0 bytes, flags=OOB|INTERNAL]

After NodeA sends the VIEW_ACK to itself, the same Incoming thread starts replaying previous messages:

14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 10
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 18
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 20
14:49:10,392 TRACE (Incoming-1,DataRehashedEventTest-NodeA-1707:) [RequestCorrelator] calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 21

I've attached the entire log here, in case I've missed something.

                  
> MessageDispatcher expects a response from the local node with DONT_LOOPBACK
> ---------------------------------------------------------------------------
>
>                 Key: JGRP-1823
>                 URL: https://issues.jboss.org/browse/JGRP-1823
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.5
>            Reporter: Dan Berindei
>            Assignee: Bela Ban
>             Fix For: 3.5
>
>
> MessageDispatcher doesn't remove the local node from the `real_dests` collection when using DONT_LOOPBACK, and that causes RPCs broadcasted to the entire cluster to time out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list