[jboss-jira] [JBoss JIRA] (JGRP-1807) UNICAST: skipping of seqnos
Bela Ban (JIRA)
issues at jboss.org
Mon Aug 25 10:34:59 EDT 2014
[ https://issues.jboss.org/browse/JGRP-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995475#comment-12995475 ]
Bela Ban commented on JGRP-1807:
--------------------------------
JGRP-1873 could also have been a possible cause.
> UNICAST: skipping of seqnos
> ---------------------------
>
> Key: JGRP-1807
> URL: https://issues.jboss.org/browse/JGRP-1807
> Project: JGroups
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 3.2.13, 3.5
>
>
> {noformat}
> The log starts with:
> 10-Mar-2014 13:21:47 WARN [org.jgroups.protocols.UNICAST2] (OOB-105,shared=tcp) node1/web: (requester=node2/web) message node2/web::1511786 not found in retransmission table of node2/web:
> [1511785 | 1511785 | 1511857] (53 elements, 19 missing)
> The numbers are 1511786-1511804 for "not found in retransmission...."
> And end:
> 10-Mar-2014 14:48:26 WARN [org.jgroups.protocols.UNICAST2] (OOB-118,shared=tcp) node1/web: (requester=node2/web) message node2/web::1511804 not found in retransmission table of node2/web:
> [1511785 | 1511785 | 1514802] (2998 elements, 19 missing)
> {noformat}
> It seems that node1 is missing messages 1511785-1511804 which it sent to node2. Since a null message cannot be added to the sender table (due to the {{msg.isFlagSet()}} which would throw an NPE), I asume we're skipping a seqno:
> In {{UNICAST}}, {{UNICAST2}} and {{UNICAST3}} {{down()}}, if a seqno is skipped, we get endless retransmissions. Example:
> * We get the next seqno 1, add the message to the table and send it
> * We get the next seqno 2. However, if {{running}} is false, we don't add the message
> * We get the next seqno 3. Now {{running}} is true, and we add 3 to the table
> --> Now we have a missing message 2 which will always be null as it hasn't been added to the table
> This is highly unlikely, as I haven't been able to find a scenario where running flips from true to false to true quickly. If it flips from true to false, this is because {{stop()}} has been called. Also, in {{down()}}, we actually check {{running}} and return if false.
> In this scenario, the connections are all removed, so seqno is reset to 1.
> Anyway, I'm going to replace the {{while(running)}} loop with a {{do while(running)}} loop, so we always add the message to the table, even if running=false.
> [1] https://github.com/belaban/JGroups/blob/Branch_JGroups_3_2/src/org/jgroups/protocols/UNICAST2.java#L490
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
More information about the jboss-jira
mailing list