[jboss-jira] [JBoss JIRA] (JGRP-1362) NAKACK: second line of defense for requested retransmissions that are not found

Bela Ban (JIRA) jira-events at lists.jboss.org
Mon Jan 23 08:38:18 EST 2012


     [ https://issues.jboss.org/browse/JGRP-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-1362:
---------------------------

    Fix Version/s: 3.x
                       (was: 3.1)


With the changes done in JGRP-1396 (NAKACK2), the retransmission architecture was dramatically simplified, so we may end up not even needing this second line of defense anymore. Let's wait and see how NAKACK2 behaves in the field, before possibly re-opening this issue.
                
> NAKACK: second line of defense for requested retransmissions that are not found
> -------------------------------------------------------------------------------
>
>                 Key: JGRP-1362
>                 URL: https://issues.jboss.org/browse/JGRP-1362
>             Project: JGroups
>          Issue Type: Enhancement
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.x
>
>
> When the original sender B is asked by A to retransmit message M, but doesn't have M in its retransmission table anymore, it should tell A, or else A will send retransmission requests to B until A or B leave.
> This problem should have been fixed by JGRP-1251, but if it turns out it wasn't, then this JIRA is (1) a second line of defense to stop the endless retransmission requests and (2) will give us valuable diagnostic information to fix the underlying problem (should there still be one).
> Problem:
> - A has a NakReceiverWindow (NRW) of 50 (highest_delivered seqno) for B
> - B's NRW, however, is 200. B garbage collected messages up to 150.
> - When B sends message 201, A will ask B for retransmission of [51-200]
> - B will retransmit messages [150-200], but it cannot send messages 51-149, as it doesn't have them anymore !
> - A will add messages [150-200], but its NRW is still 50 (highest_delivered)
> - A will continue asking B for messages [51-149] (it does have [150-201])
> - This will go on forever, or until B or A leaves
> SOLUTION:
> - When the *original sender* B of message M receives a retransmission request for M (from A), and it doesn't have M in its retransmission table, it should send back a MSG_NOT_FOUND message to A including B's digest
> - When A receives the MSG_NOT_FOUND message, it does the following:
>   - It logs it own NRW for B
>   - It logs B's digest
>   - It logs its digest history
>   (This information is valuable for investigating the underlying issue)
>   - Then A's NRW for B is adjusted:
>     - The highest_delivered seqno is set to B.digest.highest_delivered
>     - All messages in xmit_table below B.digest.highest_delivered are removed
>     - All retransmission tasks in the retransmitter <= B.digest.highest_delivered are cancelled and removed
>       (This will stop the retransmission)
> Again, this is a second line of defense, which should never be used. If the underlying problem does occur, however, we'll have valuable information in the logs to diagnose what went wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list