[jboss-jira] [JBoss JIRA] (JGRP-1807) UNICAST: skipping of seqnos

Bela Ban (JIRA) issues at jboss.org
Wed Mar 12 01:55:10 EDT 2014


Bela Ban created JGRP-1807:
------------------------------

             Summary: UNICAST: skipping of seqnos
                 Key: JGRP-1807
                 URL: https://issues.jboss.org/browse/JGRP-1807
             Project: JGroups
          Issue Type: Bug
            Reporter: Bela Ban
            Assignee: Bela Ban
             Fix For: 3.2.13, 3.5


{noformat}
The log starts with:
10-Mar-2014 13:21:47 WARN  [org.jgroups.protocols.UNICAST2] (OOB-105,shared=tcp) AS_DR_IBE03/web: (requester=AS_DR_IBE06/web) message AS_DR_IBE06/web::1511786 not found in retransmission table of AS_DR_IBE06/web:
[1511785 | 1511785 | 1511857] (53 elements, 19 missing)

The numbers are 1511786-1511804  for "not found in retransmission...."

And end:
10-Mar-2014 14:48:26 WARN  [org.jgroups.protocols.UNICAST2] (OOB-118,shared=tcp) AS_DR_IBE03/web: (requester=AS_DR_IBE06/web) message AS_DR_IBE06/web::1511804 not found in retransmission table of AS_DR_IBE06/web:
[1511785 | 1511785 | 1514802] (2998 elements, 19 missing) 
{noformat}

It seems that 03  is missing messages 1511785-1511804 which it sent to 06. Since a null message cannot be added to the sender table (due to the {{msg.isFlagSet()}} which would throw an NPE), I asume we're skipping a seqno:

In {{UNICAST}}, {{UNICAST2}} and {{UNICAST3}} {{down()}}, if a seqno is skipped, we get endless retransmissions. Example: 
* We get the next seqno 1, add the message to the table and send it
* We get the next seqno 2. However, if {{running}} is false, we don't add the message
* We get the next seqno 3. Now {{running}} is true, and we add 3 to the table
--> Now we have a missing message 2 which will always be null as it hasn't been added to the table

This is highly unlikely, as I haven't been able to find a scenario where running flips from true to false to true quickly. If it flips from true to false, this is because {{stop()}} has been called. Also, in {{down()}}, we actually check {{running}} and return if false.
In this scenario, the connections are all removed, so seqno is reset to 1.
Anyway, I'm going to replace the {{while(running)}} loop with a {{do while(running)}} loop, so we always add the message to the table, even if running=false.

[1] https://github.com/belaban/JGroups/blob/Branch_JGroups_3_2/src/org/jgroups/protocols/UNICAST2.java#L490

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list