[jboss-jira] [JBoss JIRA] (JGRP-1539) NAKACK2: xmit_interval sometimes triggers unneeded retransmissions

Bela Ban (JIRA) jira-events at lists.jboss.org
Fri Nov 16 05:15:23 EST 2012


    [ https://issues.jboss.org/browse/JGRP-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734818#comment-12734818 ] 

Bela Ban commented on JGRP-1539:
--------------------------------

Possible solution: retransmit messages not for the current xmit_interval, but the *previous* one ! 

- The retransmit task maintains a hashmap of addresses and highest sequence numbers (seqnos)
- Whenever it scans over a table for member A and sees missing messages, it
  - takes the highest missing seqno and adds it into the hashmap
  - if there was a previous entry, it retransmits all messages up to that seqno (which is then replaced by the new seqno)
  - if there wasn't a previous entry, it doesn't retransmit at all

Example (members are A and B):
- xmit_interval is 1s
- At T1 (first invocation of the retransmit task), we have the following missing messages:
  - A: 1,2, 5-10, 15
  - B: 70,71
- The retransmit task now adds A and B to its table:
  - A: 15
  - B: 71
- Because both entries are new, no retransmission requests are sent
- At T2, we have the following missing messages:
  - A: 7-10,15, 21, 25-30 (messages 1,2,5,6 were received between T1 and T2)
  - B: none (message 71 and 72 were received, no missing messages)
- The retransmit task now asks A for retransmission of 7-10, 15
- It sets the retransmit table to:
  - A: 30
  - B: null

The effect of this is that messages 21 and 25-30 from A are not requested from A as they went missing in time frame [T1.. T2].

The retransmit task at TN always asks for retransmission of messages that were missing in time frame [TN-2 .. TN-1]. This prevents retransmission of messages that were marked as missing immediately before the retransmission task kicked in, and that will most likely be received a few milliseconds later. This prevents unnecessary retransmissions. The cost is that it takes potentially 2 * xmit_interval ms to request retransmission of a missing message.



                
> NAKACK2: xmit_interval sometimes triggers unneeded retransmissions
> ------------------------------------------------------------------
>
>                 Key: JGRP-1539
>                 URL: https://issues.jboss.org/browse/JGRP-1539
>             Project: JGroups
>          Issue Type: Enhancement
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.3
>
>
> In NAKACK2 (and UNICAST2 as well), a timer task kicks in every xmit_interval milliseconds. If we just added a message out of order, then the retransmit task might send a spurious retransmission message. 
> Example:
> - Messages 1,2,3,4,5 are received at time T0 (in ms)
> - At time T1000, message 20 is received
> - At time T1001, the retransmit task kicks in and asks the sender for retransmission of messages [6-19]
> - At time T1009, messages 6-19 are received
> Problem: the out-of-order message 20 was received just before the retransmit task kicked in. Has the task waited for another 9 ms (until time T1009), the retransmission would not have been necessary.
> The underlying cause is that the retransmit tasks handles recently added gaps the same as gaps added before xmit_interval, as we don't maintain a timestamp for added messages.
> Another reason is also that the message bundler on the sender side might send messages in random order (see link below).
> SOLUTION:
> Investigate how to change the retransmission task such that it excludes gaps created within the last xmit_interval ms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list