[
https://issues.jboss.org/browse/JGRP-1817?page=com.atlassian.jira.plugin....
]
Richard Achmatowicz commented on JGRP-1817:
-------------------------------------------
I've been looking at this for some time now.
When channels start up, the discovery phase calls Discovery.findAllMembers() which results
in an interaction like this:
- channel C sends GET_MBRS_REQ to channels A and B
- these channels respond with GET_MBRS_RSP
- C determines who the members are
{noformat}
Discovery: C calling findInitialMbrs:
Discovery: C calling findMembers: num_expected = 10, view_id = null
Discovery: sending discovery request: view_id = null, data = non-null
C: sending in-line discovery request to 10.16.95.7:27199
C: sending in-line discovery request to 10.16.95.7:27200
TCPPING: A received discovery request from C
Discovery: A processing GET_MBRS_REQ from sender: C
Discovery: A sending discovery response to C
C: sending in-line discovery request to 10.16.95.7:27202
Discovery: C processing GET_MBRS_RSP from A:A, view_id=[A|1], is_server=true,
is_coord=true, logical_name=A, physical_addrs=10.16.95.7:27199
Discovery: C called findMembers
417364 [TRACE] GMS: - C: initial_mbrs are A
417364 [DEBUG] GMS: - election results: {A=1}
417364 [DEBUG] GMS: - sending JOIN(C) to A
TCPPING: B received discovery request from C
Discovery: B processing GET_MBRS_REQ from sender: C
Discovery: B sending discovery response to C
Discovery: C processing GET_MBRS_RSP from B:B, view_id=[A|1], is_server=true,
is_coord=false, logical_name=B, physical_addrs=10.16.95.7:27200
{noformat}
This seems to work fine.
However, when calling MERGE.sendMergeSolicitation(), Discovery.findAllViews() is called
instead of Discovery.findAllMembers().
This makes use of the same underlying method Discovery.findMembers(), but the behaviour
ends up being completely different. In many cases, there is no evidence of the
GET_MBRS_REQ messages arriving at the remote members, among other things.
For example:
{noformat}
==== triggering merge solicitation ====:
Discovery: A calling findAllViews:
Discovery: A calling findMembers: num_expected = 10, view_id = [A|5]
Discovery: sending discovery request: view_id = [A|5], data = null
370387 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27216
A: sending in-line discovery request to 10.16.95.7:27216
370392 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27218
A: sending in-line discovery request to 10.16.95.7:27218
370393 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27217
A: sending in-line discovery request to 10.16.95.7:27217
373394 [TRACE] TCPPING: - A: discovery took 3007 ms: responses: 1 total (1 servers (0
coord), 0 clients); responses received = B
373395 [TRACE] MERGE2: - Discovery results:
[B]: view_id=[A|6] ([A|6] [A, B])
[A]: view_id=[A|5] ([A|5] [A])
373395 [DEBUG] MERGE2: - A found different views : [A|6], [A|5]; sending up MERGE event
with merge participants [B, A].
Discovery results:
[B]: coord=A
[A]: coord=A
==== checking views after merge ====:
....................Disabling TRACE debugging for GMS, MERGE2 and Discovery
A's view: [B|7] [B, A]
B's view: [B|7] [B, A]
C's view: [A|7] [A, B, C]
{noformat}
Note the absence of the messages concerning GET_MBRS_REQ.
I'm still looking at this. It's a puzzle.
OverlappingMergeTest testSameCreatorDifferentIDs fails to create
correct merged view
------------------------------------------------------------------------------------
Key: JGRP-1817
URL:
https://issues.jboss.org/browse/JGRP-1817
Project: JGroups
Issue Type: Bug
Affects Versions: 3.2.13
Environment: RHEL
Reporter: Richard Achmatowicz
Assignee: Bela Ban
Fix For: 3.2.14
This test does the following:
- creates three channels a,b,c
- injects views
{noformat}
A: {A|5 A}, B:{A|6 A,B}, C:{A|7 A,B,C}
{noformat}
- calls MERGE.sendMergeSolicitation() on channel A to simulate the calling of the
periodic task MERGE.findSubgroupsTask which should find all views of all reachable
members, check if there are different views, and if there are prepare and send a MERGE
event up to GMS
- checks that all channels have the final view of size 3
The test fails intermittently but frequently on RHEL, with the same failure each time:
{noformat}
-------------------------------------------------------------------
GMS: address=A, cluster=OverlappingMergeTest, physical address=10.16.95.7:27215
-------------------------------------------------------------------
-------------------------------------------------------------------
GMS: address=B, cluster=OverlappingMergeTest, physical address=10.16.95.7:27216
-------------------------------------------------------------------
-------------------------------------------------------------------
GMS: address=C, cluster=OverlappingMergeTest, physical address=10.16.95.7:27217
-------------------------------------------------------------------
------------- testSameCreatorDifferentIDs -----------
[A] view=[A|5] [A]
[B] view=[A|6] [A, B]
[C] view=[A|7] [A, B, C]
A's view: [A|5] [A]
B's view: [A|6] [A, B]
C's view: [A|7] [A, B, C]
Enabling TRACE debugging for GMS, MERGE2 and Discovery
==== triggering merge solicitation ====:
212534 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27216
212537 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27218
212538 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27217
215538 [TRACE] TCPPING: - A: discovery took 3004 ms: responses: 1 total (1 servers (0
coord), 0 clients)
215539 [TRACE] MERGE2: - Discovery results:
[B]: view_id=[A|6] ([A|6] [A, B])
[A]: view_id=[A|5] ([A|5] [A])
215539 [DEBUG] MERGE2: - A found different views : [A|5], [A|6]; sending up MERGE event
with merge participants [B, A].
Discovery results:
[B]: coord=A
[A]: coord=A
==== checking views after merge ====:
....................Disabling TRACE debugging for GMS, MERGE2 and Discovery
A's view: [A|7] [A, B]
B's view: [A|7] [A, B]
C's view: [A|7] [A, B, C]
{noformat}
Whenever this test fails, it is the discovery phase which fails to find the correct set
of views. Instead of finding views for channels A, B and C, it only finds views for
channels A and B.
Also, the discovery requests are sent to host:port combinations which are offset by 1.
For example, in the case above, the host:port combinations of the channels are
10.16.95.7:27215, 10.16.95.7:27216, and 10.16.95.7:27217, but the pings go put to
10.16.95.7:27216, 10.16.95.7:27217, and 10.16.95.7:27218. Not sure if this is significant
as it still covers the channels B and C.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira