Bela Ban created JGRP-1807:
------------------------------
Summary: UNICAST: skipping of seqnos
Key: JGRP-1807
URL:
https://issues.jboss.org/browse/JGRP-1807
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.2.13, 3.5
{noformat}
The log starts with:
10-Mar-2014 13:21:47 WARN [org.jgroups.protocols.UNICAST2] (OOB-105,shared=tcp)
AS_DR_IBE03/web: (requester=AS_DR_IBE06/web) message AS_DR_IBE06/web::1511786 not found in
retransmission table of AS_DR_IBE06/web:
[1511785 | 1511785 | 1511857] (53 elements, 19 missing)
The numbers are 1511786-1511804 for "not found in retransmission...."
And end:
10-Mar-2014 14:48:26 WARN [org.jgroups.protocols.UNICAST2] (OOB-118,shared=tcp)
AS_DR_IBE03/web: (requester=AS_DR_IBE06/web) message AS_DR_IBE06/web::1511804 not found in
retransmission table of AS_DR_IBE06/web:
[1511785 | 1511785 | 1514802] (2998 elements, 19 missing)
{noformat}
It seems that 03 is missing messages 1511785-1511804 which it sent to 06. Since a null
message cannot be added to the sender table (due to the {{msg.isFlagSet()}} which would
throw an NPE), I asume we're skipping a seqno:
In {{UNICAST}}, {{UNICAST2}} and {{UNICAST3}} {{down()}}, if a seqno is skipped, we get
endless retransmissions. Example:
* We get the next seqno 1, add the message to the table and send it
* We get the next seqno 2. However, if {{running}} is false, we don't add the message
* We get the next seqno 3. Now {{running}} is true, and we add 3 to the table
--> Now we have a missing message 2 which will always be null as it hasn't been
added to the table
This is highly unlikely, as I haven't been able to find a scenario where running flips
from true to false to true quickly. If it flips from true to false, this is because
{{stop()}} has been called. Also, in {{down()}}, we actually check {{running}} and return
if false.
In this scenario, the connections are all removed, so seqno is reset to 1.
Anyway, I'm going to replace the {{while(running)}} loop with a {{do while(running)}}
loop, so we always add the message to the table, even if running=false.
[1]
https://github.com/belaban/JGroups/blob/Branch_JGroups_3_2/src/org/jgroup...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira