[jboss-jira] [JBoss JIRA] (JGRP-2237) The single node in the cluster not become a coordinator after coordinator leave.
kfir avraham (JIRA)
issues at jboss.org
Wed Nov 29 07:02:00 EST 2017
[ https://issues.jboss.org/browse/JGRP-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495116#comment-13495116 ]
kfir avraham edited comment on JGRP-2237 at 11/29/17 7:01 AM:
--------------------------------------------------------------
i attached logs from server_A, server_B, and conf file.
in this case (after upgrade to version 4.0.8) it is look better but from some reason after a few restarts, server_A doesn't sent a join message to the other machine, and create new view with 'is_coord-true'.
any idea what could be the reason for that?
BTW, we must to use in ' port_range="0" ', because we want to use only in this port, and it worked perfect in 3.6.11 version (we upgraded the version because security issue).
when i set ' port_range="5" ', they not discovered each other from the beginning.
*+Log in trace mode:+*
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [DEBUG] - thread pool min/max/keep-alive: 2/30/60000 use_fork_join=false, internal pool: 0/4/30000 (2 cores available)
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [TRACE] - null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.UNICAST3] [main] [TRACE] - null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.STABLE] [main] [TRACE] - clm-tlv-spih31-6939: stable task started
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCPPING] [main] [TRACE] - clm-tlv-spih31-6939: sending discovery request to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [TRACE] - clm-tlv-spih31-6939: sending msg to 10.63.16.13:8102, src=clm-tlv-spih31-6939, headers are TCPPING: [GET_MBRS_REQ cluster=HACluster initial_discovery=true], TP: [cluster_name=HACluster]
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [TQ-Bundler-7,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*connecting to 10.63.16.13:8102*+
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [Connection.Receiver [10.63.16.3:50534 - 10.63.16.13:8102]-9,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*removed connection to 10.63.16.13:8102*+
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [TcpServer.Acceptor[8102]-5,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*accepted connection from 10.63.16.13:8102*+
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [TRACE] - clm-tlv-spih31-6939: *+no members discovered after 1029 ms:+* creating cluster as first member
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [DEBUG] -
[clm-tlv-spih31-6939 setDigest()]
existing digest: []
new digest: clm-tlv-spih31-6939: [0 (0)]
resulting digest: clm-tlv-spih31-6939: [0 (0)]
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [DEBUG] - clm-tlv-spih31-6939: installing view [clm-tlv-spih31-6939|0] (1) [clm-tlv-spih31-6939]
was (Author: kavraham):
i attached logs from server_A, server_B, and conf file.
in this case (after upgrade to version 4.0.8) it is look better but from some reason after a few restarts, server_A doesn't sent a join message to the other machine, and create new view with 'is_coord-true'.
any idea what could be the reason for that?
BTW, we must to use in ' port_range="0" ', because we want to use only in this port, and it worked perfect in 3.6.11 version (we upgraded the version because security issue).
when i set ' port_range="5" ', they not discovered each other from the beginning.
*+Log in trace mode:+*
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [DEBUG] - thread pool min/max/keep-alive: 2/30/60000 use_fork_join=false, internal pool: 0/4/30000 (2 cores available)
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [TRACE] - null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.UNICAST3] [main] [TRACE] - null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.STABLE] [main] [TRACE] - clm-tlv-spih31-6939: stable task started
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCPPING] [main] [TRACE] - clm-tlv-spih31-6939: sending discovery request to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [TRACE] - clm-tlv-spih31-6939: sending msg to 10.63.16.13:8102, src=clm-tlv-spih31-6939, headers are TCPPING: [GET_MBRS_REQ cluster=HACluster initial_discovery=true], TP: [cluster_name=HACluster]
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [TQ-Bundler-7,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: connecting to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [Connection.Receiver [10.63.16.3:50534 - 10.63.16.13:8102]-9,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: removed connection to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [TcpServer.Acceptor[8102]-5,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: accepted connection from 10.63.16.13:8102
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [TRACE] - clm-tlv-spih31-6939: no members discovered after 1029 ms: creating cluster as first member
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [DEBUG] -
[clm-tlv-spih31-6939 setDigest()]
existing digest: []
new digest: clm-tlv-spih31-6939: [0 (0)]
resulting digest: clm-tlv-spih31-6939: [0 (0)]
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [DEBUG] - clm-tlv-spih31-6939: installing view [clm-tlv-spih31-6939|0] (1) [clm-tlv-spih31-6939]
> The single node in the cluster not become a coordinator after coordinator leave.
> --------------------------------------------------------------------------------
>
> Key: JGRP-2237
> URL: https://issues.jboss.org/browse/JGRP-2237
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 4.0.2, 4.0.8
> Reporter: kfir avraham
> Assignee: Bela Ban
> Priority: Minor
> Attachments: Server_A.txt, Server_B.txt, conf.txt, test.xml
>
>
> I got cluster with 2 members, sometimes when the first node (coordinator) leave the cluster the second one is not become a coordinator.
> When the first one is rejoin, he could not determine coordinator and select new one from the nodes list.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
More information about the jboss-jira
mailing list