[jboss-jira] [JBoss JIRA] (JGRP-1361) NAKACK: use bigger timeouts for big retransmission tasks

Mon Jan 23 08:40:18 EST 2012

     [ https://issues.jboss.org/browse/JGRP-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban resolved JGRP-1361.
----------------------------

    Resolution: Won't Fix

NAKACK2 is the new preferred retransmission protocol, and it doesn't have individual retransmission timeouts per message (range), so this issue is moot.

> NAKACK: use bigger timeouts for big retransmission tasks
> --------------------------------------------------------
>
>                 Key: JGRP-1361
>                 URL: https://issues.jboss.org/browse/JGRP-1361
>             Project: JGroups
>          Issue Type: Enhancement
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.1
>
>
> Oftentimes we receive messages out of order, e.g. because an OOB message follows a range of regular messages, but since the regular messages are bundled, they ay arrive later than the OOB message. Say the regular messages are [10-30] and the OOB message is 31. Receiving #31 before #10-30 triggers the addition of a retransmission task for [10..30] to the retransmitter. The task usually goes off after some initial delay, say 500ms. If we receive [10..30] before the task goes off, it will be cancelled. If we receive most of the messages in [10..30], only the missing messages will get retransmitted when the task fires.
> So in most cases, a lot of retransmission tasks will never fire and are cancelled before.
> However, sometimes we can receive a seqno which is way larger than the currently highest_received seqno, e.g. highest_received=15000, seqno=20000. This means we now add a retransmission request for [15000-20000]. It is likely that the seqno was just received out of order, but it may trigger the (unneeded) retransmission of 5000 messages !
> The suggested solution is therefore to increase the initial delay for large retransmission tasks, such that they execute a bit later. Of course, the underlying assumption is that most of the missing messages will arrive before the timeout goes off. If the 5000 message are really lost, e.g. dropped by the IP stack or a switch, then they will need to get retransmitted.
> If we have an exponential_backoff of 500, the initial delay is 500ms. We could say that a 'large' retransmission task is any task which asks for retransmission of more than 10% of the current retransmission table's size. We could compute an offset to the initial delay, which is added to the delay (only the first time), by using the delta and the current delay.
> We could add this delta to delay only the first time a retransmission task is scheduled, or we could add this to any retransmission scheduling as long as the task is large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira