[jboss-jira] [JBoss JIRA] (JGRP-1977) More redundant initial join logic to avoid becoming a fake coordinator

Wed Nov 11 03:55:00 EST 2015

     [ https://issues.jboss.org/browse/JGRP-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-1977:
---------------------------
    Fix Version/s: 3.6.7

I'll take a look. I don't want to change this logic though as GMS should not know what kind of transport is present (TCP or UDP).
If a change is needed, this should rather be in {{PING}} or {{UDP}} itself.

> More redundant initial join logic to avoid becoming a fake coordinator
> ----------------------------------------------------------------------
>
>                 Key: JGRP-1977
>                 URL: https://issues.jboss.org/browse/JGRP-1977
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Osamu Nagano
>            Assignee: Bela Ban
>             Fix For: 3.6.7
>
>
> If the very initial JGroups discovery packet is lost, it is never recovered by the current GMS join logic.  The node will be a standalone coordinator then merges after several minutes.
> This can happen if a new node reside in another network segment and a switch between the segments requires some time to establish a new multicast route.  Currently, there is no enough time between IGMP join (by {{MulticastSocket#joinGroup()}}) and the JGroups discovery packet and the later is lost in such a network environment.  Because the number of nodes can be very large, configuring a static route in the switch is not reasonable.
> Specifically, in method {{org.jgroups.protocols.pbcast.ClientGmsImpl#joinInternal()}}, part of {{gms.getDownProtocol().down(Event.FIND_INITIAL_MBRS_EVT)}} is outside of the retry loop of GMS.max_join_attempts and GMS.join_timeout.

--
This message was sent by Atlassian JIRA
(v6.4.11#64026)