[jboss-jira] [JBoss JIRA] (JGRP-1539) NAKACK2: xmit_interval sometimes triggers unneeded retransmissions
Bela Ban (JIRA)
jira-events at lists.jboss.org
Fri Nov 16 05:19:21 EST 2012
[ https://issues.jboss.org/browse/JGRP-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734818#comment-12734818 ]
Bela Ban edited comment on JGRP-1539 at 11/16/12 5:18 AM:
----------------------------------------------------------
Possible solution: retransmit messages not for the current xmit_interval, but the *previous* one !
* The retransmit task maintains a hashmap of addresses and highest sequence numbers (seqnos)
* Whenever it scans over a table for member A and sees missing messages, it
** takes the highest missing seqno and adds it into the hashmap
** if there was a previous entry, it retransmits all messages up to that seqno (which is then replaced by the new seqno)
** if there wasn't a previous entry, it doesn't retransmit at all
Example (members are A and B):
* xmit_interval is 1s
* At T1 (first invocation of the retransmit task), we have the following missing messages:
** A: 1,2, 5-10, 15
** B: 70,71
* The retransmit task now adds A and B to its table:
** A: 15
** B: 71
* Because both entries are new, no retransmission requests are sent
* At T2, we have the following missing messages:
** A: 7-10,15, 21, 25-30 (messages 1,2,5,6 were received between T1 and T2)
** B: none (message 71 and 72 were received, no missing messages)
* The retransmit task now asks A for retransmission of 7-10, 15
* It sets the retransmit table to:
** A: 30
** B: null
The effect of this is that messages 21 and 25-30 from A are not requested from A as they went missing in time frame [T1.. T2].
The retransmit task at TN always asks for retransmission of messages that were missing in time frame [TN-2 .. TN-1]. This prevents retransmission of messages that were marked as missing immediately before the retransmission task kicked in, and that will most likely be received a few milliseconds later. This prevents unnecessary retransmissions. The cost is that it takes potentially 2 * xmit_interval ms to request retransmission of a missing message.
was (Author: belaban):
Possible solution: retransmit messages not for the current xmit_interval, but the *previous* one !
- The retransmit task maintains a hashmap of addresses and highest sequence numbers (seqnos)
- Whenever it scans over a table for member A and sees missing messages, it
- takes the highest missing seqno and adds it into the hashmap
- if there was a previous entry, it retransmits all messages up to that seqno (which is then replaced by the new seqno)
- if there wasn't a previous entry, it doesn't retransmit at all
Example (members are A and B):
- xmit_interval is 1s
- At T1 (first invocation of the retransmit task), we have the following missing messages:
- A: 1,2, 5-10, 15
- B: 70,71
- The retransmit task now adds A and B to its table:
- A: 15
- B: 71
- Because both entries are new, no retransmission requests are sent
- At T2, we have the following missing messages:
- A: 7-10,15, 21, 25-30 (messages 1,2,5,6 were received between T1 and T2)
- B: none (message 71 and 72 were received, no missing messages)
- The retransmit task now asks A for retransmission of 7-10, 15
- It sets the retransmit table to:
- A: 30
- B: null
The effect of this is that messages 21 and 25-30 from A are not requested from A as they went missing in time frame [T1.. T2].
The retransmit task at TN always asks for retransmission of messages that were missing in time frame [TN-2 .. TN-1]. This prevents retransmission of messages that were marked as missing immediately before the retransmission task kicked in, and that will most likely be received a few milliseconds later. This prevents unnecessary retransmissions. The cost is that it takes potentially 2 * xmit_interval ms to request retransmission of a missing message.
> NAKACK2: xmit_interval sometimes triggers unneeded retransmissions
> ------------------------------------------------------------------
>
> Key: JGRP-1539
> URL: https://issues.jboss.org/browse/JGRP-1539
> Project: JGroups
> Issue Type: Enhancement
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 3.3
>
>
> In NAKACK2 (and UNICAST2 as well), a timer task kicks in every xmit_interval milliseconds. If we just added a message out of order, then the retransmit task might send a spurious retransmission message.
> Example:
> - Messages 1,2,3,4,5 are received at time T0 (in ms)
> - At time T1000, message 20 is received
> - At time T1001, the retransmit task kicks in and asks the sender for retransmission of messages [6-19]
> - At time T1009, messages 6-19 are received
> Problem: the out-of-order message 20 was received just before the retransmit task kicked in. Has the task waited for another 9 ms (until time T1009), the retransmission would not have been necessary.
> The underlying cause is that the retransmit tasks handles recently added gaps the same as gaps added before xmit_interval, as we don't maintain a timestamp for added messages.
> Another reason is also that the message bundler on the sender side might send messages in random order (see link below).
> SOLUTION:
> Investigate how to change the retransmission task such that it excludes gaps created within the last xmit_interval ms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list