[jboss-jira] [JBoss JIRA] Commented: (JGRP-952) MERGE: UNICAST can lose messages on merging
Bela Ban (JIRA)
jira-events at lists.jboss.org
Tue Apr 7 21:40:22 EDT 2009
[ https://jira.jboss.org/jira/browse/JGRP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12461139#action_12461139 ]
Bela Ban commented on JGRP-952:
-------------------------------
Yet another alternative:
- We stick with the current alternative and ack messages as soon as they've been received
- When we get a SEND_FIRST_SEQNO and we have gaps in the sender window, we trash the sender window and send seqno #1
- This solution is in line with the first comment on this case
Example:
- A sends #5 to B
- B expects #4 from A, but adds #5 and acks it
- A receives the ack(#5) and removes #5 from its sender window
- Now there is a partition such that B trashes its connection window form A, but A keeps its window for B (A: {A,B}, B: {B})
- The partition heals
- A (re-)sends #4 to B
- B sends a SEND_FIRST_SEQNO to A
- A checks its sender window: it has #4 for retransmission, but the next seqno to be sent is #6, so there is a gap
- A trashes its connection window, *but does not send anything* ! The next seqno to be sent is #1 (with a conn-id because of the first seqno)
--> We could have copied the old #4 into the new sender window and sent it as #1. The problem though is that we now send the old #4 but not the old #5; I think it's better to trash them both !
--> This is in line with what we do in *non-overlapping* partitions: when we have #4 in the sender window and there is a partition A:{A} and B:{B}, then both A and B trash their connection windows and therefore lose both #4 and #5
--> As discussed in the above comments, this is also in line with NAKACK, which can also lose messages on a merge
--> After a merge, all sub-partitions are supposed to fetch state anyway, or at least merge their possibly divergent states
> MERGE: UNICAST can lose messages on merging
> -------------------------------------------
>
> Key: JGRP-952
> URL: https://jira.jboss.org/jira/browse/JGRP-952
> Project: JGroups
> Issue Type: Bug
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 2.6.10, 2.8
>
>
> The following use case loses messages:
> - A sends #5 to B
> - B expects #4 from A, but adds #5 and acks it
> - A receives the ack(#5) and removes #5 from its sender window
> - Now there is a partition such that B trashes its connection window form A, but A keeps its window for B (A: {A,B}, B: {B})
> - The partition heals and A sends #6 to B
> - B asks A for its lowest seqno, A resends #4 (with a conn_id)
> - B creates a receiver window for A with seqno=#4
> - A resends #6
> - B adds #6 to its window, but doesn't deliver it because it is missing #5
> --> However, A will NEVER resend #5 because the ack(#5) from B removed #5 from A's sender window !
> ==> Possible SOLUTION: when A gets the SEND_FIRST_SEQNO and there are (unacked) messages in A's sender window, and they are not in order, then A will trash its connection window and copy the pending messages into the new window (with new seqnos !) before sending #1 (with conn_id)
> ==> This solution might lose the original message #5, but that's better than B never being able to deliver any messages anymore !
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list