NAKACK: use bigger timeouts for big retransmission tasks
--------------------------------------------------------
Key: JGRP-1361
URL:
https://issues.jboss.org/browse/JGRP-1361
Project: JGroups
Issue Type: Enhancement
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.1
Oftentimes we receive messages out of order, e.g. because an OOB message follows a range
of regular messages, but since the regular messages are bundled, they ay arrive later than
the OOB message. Say the regular messages are [10-30] and the OOB message is 31. Receiving
#31 before #10-30 triggers the addition of a retransmission task for [10..30] to the
retransmitter. The task usually goes off after some initial delay, say 500ms. If we
receive [10..30] before the task goes off, it will be cancelled. If we receive most of the
messages in [10..30], only the missing messages will get retransmitted when the task
fires.
So in most cases, a lot of retransmission tasks will never fire and are cancelled before.
However, sometimes we can receive a seqno which is way larger than the currently
highest_received seqno, e.g. highest_received=15000, seqno=20000. This means we now add a
retransmission request for [15000-20000]. It is likely that the seqno was just received
out of order, but it may trigger the (unneeded) retransmission of 5000 messages !
The suggested solution is therefore to increase the initial delay for large retransmission
tasks, such that they execute a bit later. Of course, the underlying assumption is that
most of the missing messages will arrive before the timeout goes off. If the 5000 message
are really lost, e.g. dropped by the IP stack or a switch, then they will need to get
retransmitted.
If we have an exponential_backoff of 500, the initial delay is 500ms. We could say that a
'large' retransmission task is any task which asks for retransmission of more than
10% of the current retransmission table's size. We could compute an offset to the
initial delay, which is added to the delay (only the first time), by using the delta and
the current delay.
We could add this delta to delay only the first time a retransmission task is scheduled,
or we could add this to any retransmission scheduling as long as the task is large.
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/software/jira