[
https://issues.jboss.org/browse/JGRP-2237?page=com.atlassian.jira.plugin....
]
kfir avraham edited comment on JGRP-2237 at 11/29/17 7:01 AM:
--------------------------------------------------------------
i attached logs from server_A, server_B, and conf file.
in this case (after upgrade to version 4.0.8) it is look better but from some reason after
a few restarts, server_A doesn't sent a join message to the other machine, and create
new view with 'is_coord-true'.
any idea what could be the reason for that?
BTW, we must to use in ' port_range="0" ', because we want to use only
in this port, and it worked perfect in 3.6.11 version (we upgraded the version because
security issue).
when i set ' port_range="5" ', they not discovered each other from the
beginning.
*+Log in trace mode:+*
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [DEBUG] - thread pool
min/max/keep-alive: 2/30/60000 use_fork_join=false, internal pool: 0/4/30000 (2 cores
available)
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [TRACE] -
null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.UNICAST3] [main] [TRACE] - null: set
max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.STABLE] [main] [TRACE] -
clm-tlv-spih31-6939: stable task started
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCPPING] [main] [TRACE] -
clm-tlv-spih31-6939: sending discovery request to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [TRACE] -
clm-tlv-spih31-6939: sending msg to 10.63.16.13:8102, src=clm-tlv-spih31-6939, headers are
TCPPING: [GET_MBRS_REQ cluster=HACluster initial_discovery=true], TP:
[cluster_name=HACluster]
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP]
[TQ-Bundler-7,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*connecting to
10.63.16.13:8102*+
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [Connection.Receiver
[10.63.16.3:50534 - 10.63.16.13:8102]-9,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102:
+*removed connection to 10.63.16.13:8102*+
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP]
[TcpServer.Acceptor[8102]-5,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: +*accepted
connection from 10.63.16.13:8102*+
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [TRACE] -
clm-tlv-spih31-6939: *+no members discovered after 1029 ms:+* creating cluster as first
member
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [DEBUG] -
[clm-tlv-spih31-6939 setDigest()]
existing digest: []
new digest: clm-tlv-spih31-6939: [0 (0)]
resulting digest: clm-tlv-spih31-6939: [0 (0)]
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [DEBUG] -
clm-tlv-spih31-6939: installing view [clm-tlv-spih31-6939|0] (1) [clm-tlv-spih31-6939]
was (Author: kavraham):
i attached logs from server_A, server_B, and conf file.
in this case (after upgrade to version 4.0.8) it is look better but from some reason after
a few restarts, server_A doesn't sent a join message to the other machine, and create
new view with 'is_coord-true'.
any idea what could be the reason for that?
BTW, we must to use in ' port_range="0" ', because we want to use only
in this port, and it worked perfect in 3.6.11 version (we upgraded the version because
security issue).
when i set ' port_range="5" ', they not discovered each other from the
beginning.
*+Log in trace mode:+*
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [DEBUG] - thread pool
min/max/keep-alive: 2/30/60000 use_fork_join=false, internal pool: 0/4/30000 (2 cores
available)
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [TRACE] -
null: set max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.UNICAST3] [main] [TRACE] - null: set
max_xmit_req_size from 0 to 247600
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.pbcast.STABLE] [main] [TRACE] -
clm-tlv-spih31-6939: stable task started
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCPPING] [main] [TRACE] -
clm-tlv-spih31-6939: sending discovery request to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [main] [TRACE] -
clm-tlv-spih31-6939: sending msg to 10.63.16.13:8102, src=clm-tlv-spih31-6939, headers are
TCPPING: [GET_MBRS_REQ cluster=HACluster initial_discovery=true], TP:
[cluster_name=HACluster]
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP]
[TQ-Bundler-7,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: connecting to
10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP] [Connection.Receiver
[10.63.16.3:50534 - 10.63.16.13:8102]-9,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102:
removed connection to 10.63.16.13:8102
Nov-28-2017 23:31:36 GMT-12:00 [org.jgroups.protocols.TCP]
[TcpServer.Acceptor[8102]-5,clm-tlv-spih31-6939] [TRACE] - 10.63.16.3:8102: accepted
connection from 10.63.16.13:8102
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [TRACE] -
clm-tlv-spih31-6939: no members discovered after 1029 ms: creating cluster as first
member
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.NAKACK2] [main] [DEBUG] -
[clm-tlv-spih31-6939 setDigest()]
existing digest: []
new digest: clm-tlv-spih31-6939: [0 (0)]
resulting digest: clm-tlv-spih31-6939: [0 (0)]
Nov-28-2017 23:31:37 GMT-12:00 [org.jgroups.protocols.pbcast.GMS] [main] [DEBUG] -
clm-tlv-spih31-6939: installing view [clm-tlv-spih31-6939|0] (1) [clm-tlv-spih31-6939]
The single node in the cluster not become a coordinator after
coordinator leave.
--------------------------------------------------------------------------------
Key: JGRP-2237
URL:
https://issues.jboss.org/browse/JGRP-2237
Project: JGroups
Issue Type: Bug
Affects Versions: 4.0.2, 4.0.8
Reporter: kfir avraham
Assignee: Bela Ban
Priority: Minor
Attachments: Server_A.txt, Server_B.txt, conf.txt, test.xml
I got cluster with 2 members, sometimes when the first node (coordinator) leave the
cluster the second one is not become a coordinator.
When the first one is rejoin, he could not determine coordinator and select new one from
the nodes list.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)