[jboss-jira] [JBoss JIRA] (JGRP-1690) View and digest have to match

Bela Ban (JIRA) jira-events at lists.jboss.org
Wed Sep 11 05:39:04 EDT 2013


Bela Ban created JGRP-1690:
------------------------------

             Summary: View and digest have to match
                 Key: JGRP-1690
                 URL: https://issues.jboss.org/browse/JGRP-1690
             Project: JGroups
          Issue Type: Bug
            Reporter: Bela Ban
            Assignee: Bela Ban
             Fix For: 3.4


In some cases a view and a digest are returned, e.g. v2=\{A,B,C\} and digest=[25,10,17]. This means the highest delivered seqno are A=25, B=10 and C=17.

View and digest are returned in
* Responses to a new joiner: JoinRsp
* Merge view installations: GMS$GmsHeader.INSTALL_MERGE_VIEW
* Merge responses: GMS$GmsHeader.MERGE_RSP (to be verified)

However, in some edge cases we could potentially end up with digests which don't match the view, e.g. digest=[25,10]. This would mean that there is no entry for C, and - the way this currently works - the resulting digest would have *a 0 seqno for C* !

The above scenario can happen as follows:
* The view is v1=\{A,B\}. A is the coordinator
* C joins
* A broadcasts v2=\{A,B,C\}.
* A installs v2, but doesn't yet set the digest (NAKACK.setDigest())
* D joins
* Meanwhile, C sent 50 messages and STABLE garbage-collected C at 45
* A creates a new view v3=\{A,B,C,D\}
* A gets the digest from NAKACK: [A,B] and adds D (at 0)
* A sends a JOIN-RSP with v3=\{A,B,C,D\} and digest=[A,B,*C*,D] to D
** *C is 0*. The reason for this is that we create a MutableDigest with v3 with all seqnos being 0. Then we iterate through the digest and set the seqnos. However, since C is not set, its seqno is 0 !
* D installs the JOIN-RSP. It think the seqno for C is 0. The problem now is that when C sends message #51, D will ask it for retransmission of [1-50], but C can't furnish them as it already purged messages 1-45. This leads to endless retransmissions.
* A (belatedly) sets the digest=[A,B,C] for v2 in NAKACK


SOLUTION:
* Make sure (in the coordinator) that view and digest match. E.g. a MutableDigest could initialize all seqnos to -1 and - if after setting all values from the digest retrieved from NAKACK - throw an exception if one of the seqnos is still -1.
** We could retry fetching the digest from NAKACK for a number of tries before giving up
** In the worst case, we wouldn't send a JOIN-RSP to the new joiner, but since the joiner would retry, this is not a problem.
* Alternatively, we could have the client (ClientGmsImpl) check the digest and retry if some of the seqnos are -1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list