[JBoss JIRA] (JGRP-2237) The single node in the cluster not become a coordinator after coordinator leave.

Wednesday, 29 November 2017

    [
https://issues.jboss.org/browse/JGRP-2237?page=com.atlassian.jira.plugin....
] 

kfir avraham edited comment on JGRP-2237 at 11/29/17 7:01 AM:
--------------------------------------------------------------

i attached logs from server_A, server_B, and conf file.

in this case (after upgrade to version 4.0.8) it is look better but from some reason after
a few restarts, server_A doesn't sent a join message to the other machine, and create
new view with 'is_coord-true'.

any idea what could be the reason for that?

BTW, we must to use in ' port_range="0" ', because we want to use only
in this port, and it worked perfect in 3.6.11 version (we upgraded the version because
security issue). 
when i set ' port_range="5" ', they not discovered each other from the
beginning.

*+Log in trace mode:+*
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [DEBUG] - thread pool
min/max/keep-alive: 2/30/60000 use_fork_join=false, internal pool: 0/4/30000 (2 cores
available)
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [TRACE] -
null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.UNICAST3] [main] [TRACE] - null: set
max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.STABLE] [main] [TRACE] -
clm-tlv-spih31-6939: stable task started
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCPPING] [main] [TRACE] -
clm-tlv-spih31-6939: sending discovery request to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [TRACE] -
clm-tlv-spih31-6939: sending msg to 10.63.16.13:8102, src=clm-tlv-spih31-6939, headers are
TCPPING: [GET_MBRS_REQ cluster=HACluster initial_discovery=true], TP:
[cluster_name=HACluster]
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP]
[TQ-Bundler-7,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*connecting to
10.63.16.13:8102*+
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [Connection.Receiver
[10.63.16.3:50534 - 10.63.16.13:8102]-9,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102:
+*removed connection to 10.63.16.13:8102*+
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP]
[TcpServer.Acceptor[8102]-5,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*accepted
connection from 10.63.16.13:8102*+
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [TRACE] -
clm-tlv-spih31-6939: *+no members discovered after 1029 ms:+* creating cluster as first
member
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [DEBUG] - 
[clm-tlv-spih31-6939 setDigest()]
existing digest:  []
new digest:       clm-tlv-spih31-6939: [0 (0)]
resulting digest: clm-tlv-spih31-6939: [0 (0)]
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [DEBUG] -
clm-tlv-spih31-6939: installing view [clm-tlv-spih31-6939|0] (1) [clm-tlv-spih31-6939]

was (Author: kavraham):
i attached logs from server_A, server_B, and conf file.

in this case (after upgrade to version 4.0.8) it is look better but from some reason after
a few restarts, server_A doesn't sent a join message to the other machine, and create
new view with 'is_coord-true'.

any idea what could be the reason for that?

BTW, we must to use in ' port_range="0" ', because we want to use only
in this port, and it worked perfect in 3.6.11 version (we upgraded the version because
security issue). 
when i set ' port_range="5" ', they not discovered each other from the
beginning.

*+Log in trace mode:+*
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [DEBUG] - thread pool
min/max/keep-alive: 2/30/60000 use_fork_join=false, internal pool: 0/4/30000 (2 cores
available)
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [TRACE] -
null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.UNICAST3] [main] [TRACE] - null: set
max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.STABLE] [main] [TRACE] -
clm-tlv-spih31-6939: stable task started
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCPPING] [main] [TRACE] -
clm-tlv-spih31-6939: sending discovery request to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [TRACE] -
clm-tlv-spih31-6939: sending msg to 10.63.16.13:8102, src=clm-tlv-spih31-6939, headers are
TCPPING: [GET_MBRS_REQ cluster=HACluster initial_discovery=true], TP:
[cluster_name=HACluster]
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP]
[TQ-Bundler-7,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: connecting to
10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [Connection.Receiver
[10.63.16.3:50534 - 10.63.16.13:8102]-9,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102:
removed connection to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP]
[TcpServer.Acceptor[8102]-5,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: accepted
connection from 10.63.16.13:8102
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [TRACE] -
clm-tlv-spih31-6939: no members discovered after 1029 ms: creating cluster as first
member
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [DEBUG] - 
[clm-tlv-spih31-6939 setDigest()]
existing digest:  []
new digest:       clm-tlv-spih31-6939: [0 (0)]
resulting digest: clm-tlv-spih31-6939: [0 (0)]
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [DEBUG] -
clm-tlv-spih31-6939: installing view [clm-tlv-spih31-6939|0] (1) [clm-tlv-spih31-6939]

...
 The single node in the cluster not become a coordinator after
coordinator leave.
 --------------------------------------------------------------------------------

                 Key: JGRP-2237
                 URL: https://issues.jboss.org/browse/JGRP-2237
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 4.0.2, 4.0.8
            Reporter: kfir avraham
            Assignee: Bela Ban
            Priority: Minor
         Attachments: Server_A.txt, Server_B.txt, conf.txt, test.xml

 I got cluster with 2 members, sometimes when the first node (coordinator) leave the
cluster the second one is not become a coordinator.
 When the first one is rejoin, he could not determine coordinator and select new one from
the nodes list. 

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006