[jboss-jira] [JBoss JIRA] Commented: (JGRP-699) NAKACK: merging of digests is incorrect
Bela Ban (JIRA)
jira-events at lists.jboss.org
Wed Feb 27 07:04:42 EST 2008
[ http://jira.jboss.com/jira/browse/JGRP-699?page=comments#action_12400697 ]
Bela Ban commented on JGRP-699:
-------------------------------
The last statement is incorrect, consider the following view after shunning:
233
-----
233: 557-559
234: 901-918
234
----
234: 921-923
- 234 apparently did some garbage collection during the time it was a singleton member and adjusted the low seqno to 921 (from 901). It could do this because 233 was *not* in its group, otherwise 233 would have objected to purging seqnos higher than 918 (highest delivered seqno for 234, as seen by 233) !
- The problem here is that we do have overlapping subgroups do to unilateral shunning, this usually doesn't happen. Actually, the mergeDigest() code explicitly states that merging always occurs between disjoint subgroups (wrong assumption !)...
- So, since 234 will only be able to furnish seqnos starting at 921, the merged digest cannot contain seqnos smaller than 912 for 234 ! Otherwise, 233 (whose highest seqno for 234 is 918) will ask 234 for retransmission of seqnos 918-924 when 234::924 is received !
SOLUTION:
- Change merging of digests to have a member X be the authoritative source of seqnos, e.g. 234's 912-923 digest will not get min'ed or max'ed with other digests: so
A --> A:23, B:10, C:5
B --> A:21, B:12, C:4
C --> A:20, B:12, C:5
will result in
A:23 (from A), B:12 (from B), C:5 (from C).
A will now not request seqnos B:10-13 from B when B sends the next seqno=13, because it adjusted its digest to B:12 as well, expecting #13 as next seqno from B.
- Don't merge any digests but simply set them (setDigest()) - investigate
> NAKACK: merging of digests is incorrect
> ---------------------------------------
>
> Key: JGRP-699
> URL: http://jira.jboss.com/jira/browse/JGRP-699
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.4, 2.4.1, 2.4.1 SP1, 2.4.1 SP2, 2.4.1 SP3, 2.4.1 SP4, 2.4.2
> Reporter: Bela Ban
> Assigned To: Bela Ban
> Fix For: 2.7, 2.6.3, 2.4.3
>
>
> The merge view is:
> 2008-02-20 23:49:30,872 DEBUG [org.jgroups.protocols.pbcast.GMS]
> view=MergeView::[172.16.172.233:19382|3] [172.16.172.233:19382, 172.16.172.234:19382],
> subgroups=[[172.16.172.233:19382|1] [172.16.172.233:19382, 172.16.172.234:19382],
> [172.16.172.234:19382|2] [172.16.172.234:19382]],
> digest=172.16.172.233:19382: [557 : 564(564)], 172.16.172.234:19382: [901 : 923 (923)]
> So, the digest for 234:19382 is 901-923, which means 234 could actually satisfy the retransmit request below. However, when the xmit req arrives:
> 2008-02-20 23:49:32,004 ERROR [org.jgroups.protocols.pbcast.NAKACK]
> (requester=172.16.172.233:19382, local_addr=172.16.172.234:19382)
> message 172.16.172.234:19382::919 not found in retransmission table of
> 172.16.172.233:19382:[557 : 565 (565)]
> 172.16.172.234:19382: [921 : 924 (924) (size=3, missing=0, highest stability=921)]
> , we can see that now the digest for 234:19382 is 921-924.
> In 2.4, the merge algorith for digests (NAKACK.mergeDigest()) takes the *max* of the low seqnos (MAX(901,921), in 2.5 and higher we don't touch an existing entry, so the digest for 234:19382 would still be 901-923, and we could therefore satisfy the merge request.
> SOLUTION: use the solution implemented in 2.5 and higher: don't set digests for existing entries.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list