[
https://issues.jboss.org/browse/JGRP-1539?page=com.atlassian.jira.plugin....
]
Bela Ban edited comment on JGRP-1539 at 11/16/12 5:18 AM:
----------------------------------------------------------
Possible solution: retransmit messages not for the current xmit_interval, but the
*previous* one !
* The retransmit task maintains a hashmap of addresses and highest sequence numbers
(seqnos)
* Whenever it scans over a table for member A and sees missing messages, it
** takes the highest missing seqno and adds it into the hashmap
** if there was a previous entry, it retransmits all messages up to that seqno (which is
then replaced by the new seqno)
** if there wasn't a previous entry, it doesn't retransmit at all
Example (members are A and B):
* xmit_interval is 1s
* At T1 (first invocation of the retransmit task), we have the following missing
messages:
** A: 1,2, 5-10, 15
** B: 70,71
* The retransmit task now adds A and B to its table:
** A: 15
** B: 71
* Because both entries are new, no retransmission requests are sent
* At T2, we have the following missing messages:
** A: 7-10,15, 21, 25-30 (messages 1,2,5,6 were received between T1 and T2)
** B: none (message 71 and 72 were received, no missing messages)
* The retransmit task now asks A for retransmission of 7-10, 15
* It sets the retransmit table to:
** A: 30
** B: null
The effect of this is that messages 21 and 25-30 from A are not requested from A as they
went missing in time frame [T1.. T2].
The retransmit task at TN always asks for retransmission of messages that were missing in
time frame [TN-2 .. TN-1]. This prevents retransmission of messages that were marked as
missing immediately before the retransmission task kicked in, and that will most likely be
received a few milliseconds later. This prevents unnecessary retransmissions. The cost is
that it takes potentially 2 * xmit_interval ms to request retransmission of a missing
message.
was (Author: belaban):
Possible solution: retransmit messages not for the current xmit_interval, but the
*previous* one !
- The retransmit task maintains a hashmap of addresses and highest sequence numbers
(seqnos)
- Whenever it scans over a table for member A and sees missing messages, it
- takes the highest missing seqno and adds it into the hashmap
- if there was a previous entry, it retransmits all messages up to that seqno (which is
then replaced by the new seqno)
- if there wasn't a previous entry, it doesn't retransmit at all
Example (members are A and B):
- xmit_interval is 1s
- At T1 (first invocation of the retransmit task), we have the following missing
messages:
- A: 1,2, 5-10, 15
- B: 70,71
- The retransmit task now adds A and B to its table:
- A: 15
- B: 71
- Because both entries are new, no retransmission requests are sent
- At T2, we have the following missing messages:
- A: 7-10,15, 21, 25-30 (messages 1,2,5,6 were received between T1 and T2)
- B: none (message 71 and 72 were received, no missing messages)
- The retransmit task now asks A for retransmission of 7-10, 15
- It sets the retransmit table to:
- A: 30
- B: null
The effect of this is that messages 21 and 25-30 from A are not requested from A as they
went missing in time frame [T1.. T2].
The retransmit task at TN always asks for retransmission of messages that were missing in
time frame [TN-2 .. TN-1]. This prevents retransmission of messages that were marked as
missing immediately before the retransmission task kicked in, and that will most likely be
received a few milliseconds later. This prevents unnecessary retransmissions. The cost is
that it takes potentially 2 * xmit_interval ms to request retransmission of a missing
message.
NAKACK2: xmit_interval sometimes triggers unneeded retransmissions
------------------------------------------------------------------
Key: JGRP-1539
URL:
https://issues.jboss.org/browse/JGRP-1539
Project: JGroups
Issue Type: Enhancement
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.3
In NAKACK2 (and UNICAST2 as well), a timer task kicks in every xmit_interval
milliseconds. If we just added a message out of order, then the retransmit task might send
a spurious retransmission message.
Example:
- Messages 1,2,3,4,5 are received at time T0 (in ms)
- At time T1000, message 20 is received
- At time T1001, the retransmit task kicks in and asks the sender for retransmission of
messages [6-19]
- At time T1009, messages 6-19 are received
Problem: the out-of-order message 20 was received just before the retransmit task kicked
in. Has the task waited for another 9 ms (until time T1009), the retransmission would not
have been necessary.
The underlying cause is that the retransmit tasks handles recently added gaps the same as
gaps added before xmit_interval, as we don't maintain a timestamp for added messages.
Another reason is also that the message bundler on the sender side might send messages in
random order (see link below).
SOLUTION:
Investigate how to change the retransmission task such that it excludes gaps created
within the last xmit_interval ms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira