[jboss-jira] [JBoss JIRA] (JGRP-2237) The single node in the cluster not become a coordinator after coordinator leave.

Wed Nov 29 07:02:00 EST 2017

    [ https://issues.jboss.org/browse/JGRP-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495116#comment-13495116 ] 

kfir avraham edited comment on JGRP-2237 at 11/29/17 7:01 AM:
--------------------------------------------------------------

i attached logs from server_A, server_B, and conf file.

in this case (after upgrade to version 4.0.8) it is look better but from some reason after a few restarts, server_A doesn't sent a join message to the other machine, and create new view with 'is_coord-true'.

any idea what could be the reason for that?

BTW, we must to use in ' port_range="0" ', because we want to use only in this port, and it worked perfect in 3.6.11 version (we upgraded the version because security issue). 
when i set ' port_range="5" ', they not discovered each other from the beginning.

*+Log in trace mode:+*
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [DEBUG] - thread pool min/max/keep-alive: 2/30/60000 use_fork_join=false, internal pool: 0/4/30000 (2 cores available)
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [TRACE] - null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.UNICAST3] [main] [TRACE] - null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.STABLE] [main] [TRACE] - clm-tlv-spih31-6939: stable task started
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCPPING] [main] [TRACE] - clm-tlv-spih31-6939: sending discovery request to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [TRACE] - clm-tlv-spih31-6939: sending msg to 10.63.16.13:8102, src=clm-tlv-spih31-6939, headers are TCPPING: [GET_MBRS_REQ cluster=HACluster initial_discovery=true], TP: [cluster_name=HACluster]
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [TQ-Bundler-7,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*connecting to 10.63.16.13:8102*+
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [Connection.Receiver [10.63.16.3:50534 - 10.63.16.13:8102]-9,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*removed connection to 10.63.16.13:8102*+
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [TcpServer.Acceptor[8102]-5,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*accepted connection from 10.63.16.13:8102*+
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [TRACE] - clm-tlv-spih31-6939: *+no members discovered after 1029 ms:+* creating cluster as first member
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [DEBUG] - 
[clm-tlv-spih31-6939 setDigest()]
existing digest:  []
new digest:       clm-tlv-spih31-6939: [0 (0)]
resulting digest: clm-tlv-spih31-6939: [0 (0)]
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [DEBUG] - clm-tlv-spih31-6939: installing view [clm-tlv-spih31-6939|0] (1) [clm-tlv-spih31-6939]

was (Author: kavraham):
i attached logs from server_A, server_B, and conf file.

in this case (after upgrade to version 4.0.8) it is look better but from some reason after a few restarts, server_A doesn't sent a join message to the other machine, and create new view with 'is_coord-true'.

any idea what could be the reason for that?

BTW, we must to use in ' port_range="0" ', because we want to use only in this port, and it worked perfect in 3.6.11 version (we upgraded the version because security issue). 
when i set ' port_range="5" ', they not discovered each other from the beginning.

*+Log in trace mode:+*
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [DEBUG] - thread pool min/max/keep-alive: 2/30/60000 use_fork_join=false, internal pool: 0/4/30000 (2 cores available)
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [TRACE] - null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.UNICAST3] [main] [TRACE] - null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.STABLE] [main] [TRACE] - clm-tlv-spih31-6939: stable task started
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCPPING] [main] [TRACE] - clm-tlv-spih31-6939: sending discovery request to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [TRACE] - clm-tlv-spih31-6939: sending msg to 10.63.16.13:8102, src=clm-tlv-spih31-6939, headers are TCPPING: [GET_MBRS_REQ cluster=HACluster initial_discovery=true], TP: [cluster_name=HACluster]
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [TQ-Bundler-7,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: connecting to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [Connection.Receiver [10.63.16.3:50534 - 10.63.16.13:8102]-9,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: removed connection to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [TcpServer.Acceptor[8102]-5,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: accepted connection from 10.63.16.13:8102
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [TRACE] - clm-tlv-spih31-6939: no members discovered after 1029 ms: creating cluster as first member
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [DEBUG] - 
[clm-tlv-spih31-6939 setDigest()]
existing digest:  []
new digest:       clm-tlv-spih31-6939: [0 (0)]
resulting digest: clm-tlv-spih31-6939: [0 (0)]
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [DEBUG] - clm-tlv-spih31-6939: installing view [clm-tlv-spih31-6939|0] (1) [clm-tlv-spih31-6939]

> The single node in the cluster not become a coordinator after coordinator leave.
> --------------------------------------------------------------------------------
>
>                 Key: JGRP-2237
>                 URL: https://issues.jboss.org/browse/JGRP-2237
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0.2, 4.0.8
>            Reporter: kfir avraham
>            Assignee: Bela Ban
>            Priority: Minor
>         Attachments: Server_A.txt, Server_B.txt, conf.txt, test.xml
>
>
> I got cluster with 2 members, sometimes when the first node (coordinator) leave the cluster the second one is not become a coordinator.
> When the first one is rejoin, he could not determine coordinator and select new one from the nodes list.

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)