]
Bela Ban updated JGRP-1362:
---------------------------
Fix Version/s: 3.x
(was: 3.1)
With the changes done in JGRP-1396 (NAKACK2), the retransmission architecture was
dramatically simplified, so we may end up not even needing this second line of defense
anymore. Let's wait and see how NAKACK2 behaves in the field, before possibly
re-opening this issue.
NAKACK: second line of defense for requested retransmissions that are
not found
-------------------------------------------------------------------------------
Key: JGRP-1362
URL:
https://issues.jboss.org/browse/JGRP-1362
Project: JGroups
Issue Type: Enhancement
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.x
When the original sender B is asked by A to retransmit message M, but doesn't have M
in its retransmission table anymore, it should tell A, or else A will send retransmission
requests to B until A or B leave.
This problem should have been fixed by JGRP-1251, but if it turns out it wasn't, then
this JIRA is (1) a second line of defense to stop the endless retransmission requests and
(2) will give us valuable diagnostic information to fix the underlying problem (should
there still be one).
Problem:
- A has a NakReceiverWindow (NRW) of 50 (highest_delivered seqno) for B
- B's NRW, however, is 200. B garbage collected messages up to 150.
- When B sends message 201, A will ask B for retransmission of [51-200]
- B will retransmit messages [150-200], but it cannot send messages 51-149, as it
doesn't have them anymore !
- A will add messages [150-200], but its NRW is still 50 (highest_delivered)
- A will continue asking B for messages [51-149] (it does have [150-201])
- This will go on forever, or until B or A leaves
SOLUTION:
- When the *original sender* B of message M receives a retransmission request for M (from
A), and it doesn't have M in its retransmission table, it should send back a
MSG_NOT_FOUND message to A including B's digest
- When A receives the MSG_NOT_FOUND message, it does the following:
- It logs it own NRW for B
- It logs B's digest
- It logs its digest history
(This information is valuable for investigating the underlying issue)
- Then A's NRW for B is adjusted:
- The highest_delivered seqno is set to B.digest.highest_delivered
- All messages in xmit_table below B.digest.highest_delivered are removed
- All retransmission tasks in the retransmitter <= B.digest.highest_delivered are
cancelled and removed
(This will stop the retransmission)
Again, this is a second line of defense, which should never be used. If the underlying
problem does occur, however, we'll have valuable information in the logs to diagnose
what went wrong.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: