[jboss-jira] [JBoss JIRA] (JGRP-1817) OverlappingMergeTest testSameCreatorDifferentIDs fails to create correct merged view

Richard Achmatowicz (JIRA) issues at jboss.org
Wed Apr 2 10:19:12 EDT 2014


    [ https://issues.jboss.org/browse/JGRP-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12958529#comment-12958529 ] 

Richard Achmatowicz commented on JGRP-1817:
-------------------------------------------

I've been looking at this for some time now.
When channels start up, the discovery phase calls Discovery.findAllMembers() which results in an interaction like this: 
- channel C sends GET_MBRS_REQ to channels A and B
- these channels respond with GET_MBRS_RSP
- C determines who the members are

{noformat}
Discovery: C calling findInitialMbrs:
Discovery: C calling findMembers: num_expected = 10, view_id = null
Discovery: sending discovery request: view_id = null, data = non-null
C: sending in-line discovery request to 10.16.95.7:27199
C: sending in-line discovery request to 10.16.95.7:27200
TCPPING: A received discovery request from C
Discovery: A processing GET_MBRS_REQ from sender: C
Discovery: A sending discovery response to C
C: sending in-line discovery request to 10.16.95.7:27202
Discovery: C processing GET_MBRS_RSP from A:A, view_id=[A|1], is_server=true, is_coord=true, logical_name=A, physical_addrs=10.16.95.7:27199
Discovery: C called findMembers
417364 [TRACE] GMS: - C: initial_mbrs are A
417364 [DEBUG] GMS: - election results: {A=1}
417364 [DEBUG] GMS: - sending JOIN(C) to A
TCPPING: B received discovery request from C
Discovery: B processing GET_MBRS_REQ from sender: C
Discovery: B sending discovery response to C
Discovery: C processing GET_MBRS_RSP from B:B, view_id=[A|1], is_server=true, is_coord=false, logical_name=B, physical_addrs=10.16.95.7:27200
{noformat}

This seems to work fine.

However, when calling MERGE.sendMergeSolicitation(), Discovery.findAllViews() is called instead of Discovery.findAllMembers().
This makes use of the same underlying method Discovery.findMembers(), but the behaviour ends up being completely different. In many cases, there is no evidence of the GET_MBRS_REQ messages arriving at the remote members, among other things.

For example:
{noformat}
==== triggering merge solicitation ====:
Discovery: A calling findAllViews:
Discovery: A calling findMembers: num_expected = 10, view_id = [A|5]
Discovery: sending discovery request: view_id = [A|5], data = null
370387 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27216
A: sending in-line discovery request to 10.16.95.7:27216
370392 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27218
A: sending in-line discovery request to 10.16.95.7:27218
370393 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27217
A: sending in-line discovery request to 10.16.95.7:27217
373394 [TRACE] TCPPING: - A: discovery took 3007 ms: responses: 1 total (1 servers (0 coord), 0 clients); responses received = B
373395 [TRACE] MERGE2: - Discovery results:
[B]: view_id=[A|6] ([A|6] [A, B])
[A]: view_id=[A|5] ([A|5] [A])
373395 [DEBUG] MERGE2: - A found different views : [A|6], [A|5]; sending up MERGE event with merge participants [B, A].
Discovery results:
[B]: coord=A
[A]: coord=A

==== checking views after merge ====:
....................Disabling TRACE debugging for GMS, MERGE2 and Discovery

A's view: [B|7] [B, A]
B's view: [B|7] [B, A]
C's view: [A|7] [A, B, C]
{noformat}

Note the absence of the messages concerning GET_MBRS_REQ.

I'm still looking at this. It's a puzzle.
                
> OverlappingMergeTest testSameCreatorDifferentIDs fails to create correct merged view
> ------------------------------------------------------------------------------------
>
>                 Key: JGRP-1817
>                 URL: https://issues.jboss.org/browse/JGRP-1817
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.2.13
>         Environment: RHEL
>            Reporter: Richard Achmatowicz
>            Assignee: Bela Ban
>             Fix For: 3.2.14
>
>
> This test does the following:
> - creates three channels a,b,c
> - injects views 
> {noformat}
> A: {A|5 A}, B:{A|6 A,B}, C:{A|7 A,B,C} 
> {noformat}
> - calls MERGE.sendMergeSolicitation() on channel A to simulate the calling of the periodic task MERGE.findSubgroupsTask which should find all views of all reachable members, check if there are different views, and if there are prepare and send a MERGE event up to GMS   
> - checks that all channels have the final view of size 3
> The test fails intermittently but frequently on RHEL, with the same failure each time:
> {noformat}
> -------------------------------------------------------------------
> GMS: address=A, cluster=OverlappingMergeTest, physical address=10.16.95.7:27215
> -------------------------------------------------------------------
> -------------------------------------------------------------------
> GMS: address=B, cluster=OverlappingMergeTest, physical address=10.16.95.7:27216
> -------------------------------------------------------------------
> -------------------------------------------------------------------
> GMS: address=C, cluster=OverlappingMergeTest, physical address=10.16.95.7:27217
> -------------------------------------------------------------------
> ------------- testSameCreatorDifferentIDs -----------
> [A] view=[A|5] [A]
> [B] view=[A|6] [A, B]
> [C] view=[A|7] [A, B, C]
> A's view: [A|5] [A]
> B's view: [A|6] [A, B]
> C's view: [A|7] [A, B, C]
> Enabling TRACE debugging for GMS, MERGE2 and Discovery
> ==== triggering merge solicitation ====:
> 212534 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27216
> 212537 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27218
> 212538 [TRACE] TCPPING: - A: sending discovery request to 10.16.95.7:27217
> 215538 [TRACE] TCPPING: - A: discovery took 3004 ms: responses: 1 total (1 servers (0 coord), 0 clients)
> 215539 [TRACE] MERGE2: - Discovery results:
> [B]: view_id=[A|6] ([A|6] [A, B])
> [A]: view_id=[A|5] ([A|5] [A])
> 215539 [DEBUG] MERGE2: - A found different views : [A|5], [A|6]; sending up MERGE event with merge participants [B, A].
> Discovery results:
> [B]: coord=A
> [A]: coord=A
> ==== checking views after merge ====:
> ....................Disabling TRACE debugging for GMS, MERGE2 and Discovery
> A's view: [A|7] [A, B]
> B's view: [A|7] [A, B]
> C's view: [A|7] [A, B, C]
> {noformat}
>  Whenever this test fails, it is the discovery phase which fails to find the correct set of views. Instead of finding views for channels A, B and C, it only finds views for channels A and B.
>  
> Also, the discovery requests  are sent to host:port combinations which are offset by 1. For example, in the case above, the host:port combinations of the channels are 10.16.95.7:27215, 10.16.95.7:27216, and 10.16.95.7:27217, but the pings go put to 10.16.95.7:27216, 10.16.95.7:27217, and 10.16.95.7:27218. Not sure if this is significant as it still covers the channels B and C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list