[jboss-jira] [JBoss JIRA] (JGRP-2209) Members leaving the cluster
Swathi Kumar (JIRA)
issues at jboss.org
Mon Aug 7 18:31:00 EDT 2017
Swathi Kumar created JGRP-2209:
----------------------------------
Summary: Members leaving the cluster
Key: JGRP-2209
URL: https://issues.jboss.org/browse/JGRP-2209
Project: JGroups
Issue Type: Bug
Affects Versions: 3.0
Environment: Linux
Reporter: Swathi Kumar
Assignee: Bela Ban
We recently upgraded the jgroups jars from version 2_5_2/jgroups-all.jar to 3_4_0/jgroups-3.4.0.Alpha2.jar.
With the upgrade we see our clusters are not stable.
The members leave the cluster for short duration of time (say around 5-6m) and join back on their own.
We initially suspected it to be a network issue and we involved the network team to investiate further.
But after reviewing the network logs, it is very much evident that the network has no role to play in members leaving the cluster. The boxes on which the nodes/members are running are healthy and fine and the network is very fast and healthy too.
We are not able to determine the root cause for the members leaving the clusters.
Please note, we have multiple clusters configured (round about 5-6) and we are experiencing the same problem on all the clusters.
We request you to kindly help us in resolving this issue.
We have the below jgroups config properties in our application to create 3 channels (for security reasons have used a dummy host name here) :-
jgroups_cluster.property_string=TCP(bind_addr=host_name_A;bind_port=34061):TCPPING(initial_hosts=host_name_A[34061],host_name_A[44061],host_name_A[54061];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
jgroups_cluster.distribution_property_string=TCP(bind_port= 34060;thread_pool_rejection_policy=run):TCPPING(initial_hosts=host_name_A[34060],host_name_A[44060],host_name_A[54060];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
jgroups_cluster.lock.protocolStack=TCP(bind_addr=host_name_A;bind_port=34062:TCPPING(initial_hosts=host_name_A[34062],host_name_A[44062],host_name_A[54062];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
Please let us know if you need any logs from our end.
Kindly consider this on priority as our business is at stake with this issue happening on a daily basis.
Many thanks in advance.
Regards
Swathi BN
(IBM)
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
More information about the jboss-jira
mailing list