[jboss-jira] [JBoss JIRA] (JGRP-1977) More redundant initial join logic to avoid becoming a fake coordinator

Bela Ban (JIRA) issues at jboss.org
Fri Nov 13 05:53:00 EST 2015


    [ https://issues.jboss.org/browse/JGRP-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128664#comment-13128664 ] 

Bela Ban commented on JGRP-1977:
--------------------------------

Comments on the PR. Summary: I don't like this, as discovery logic and join logic should be completely separated.

> More redundant initial join logic to avoid becoming a fake coordinator
> ----------------------------------------------------------------------
>
>                 Key: JGRP-1977
>                 URL: https://issues.jboss.org/browse/JGRP-1977
>             Project: JGroups
>          Issue Type: Enhancement
>            Reporter: Osamu Nagano
>            Assignee: Bela Ban
>             Fix For: 3.6.7
>
>
> If the very initial JGroups discovery packet is lost, it is never recovered by the current GMS join logic.  The node will be a standalone coordinator then merges after several minutes.
> This can happen if a new node reside in another network segment and a switch between the segments requires some time to establish a new multicast route.  Currently, there is no enough time between IGMP join (by {{MulticastSocket#joinGroup()}}) and the JGroups discovery packet and the later is lost in such a network environment.  Because the number of nodes can be very large, configuring a static route in the switch is not reasonable.
> Specifically, in method {{org.jgroups.protocols.pbcast.ClientGmsImpl#joinInternal()}}, part of {{gms.getDownProtocol().down(Event.FIND_INITIAL_MBRS_EVT)}} is outside of the retry loop of GMS.max_join_attempts and GMS.join_timeout.



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the jboss-jira mailing list