]
Bela Ban closed JGRP-2209.
--------------------------
Resolution: Out of Date
Hi Swathi,
as discussed on the call, the findings/suggestions were:
* UNICAST3 was missing from the config, use FD_ALL/FD_SOCK
* Old config: use the XML config {{tcp.xml}} shipped with the version of JGroups you use
and make modifications to it
* Switch to XML-style config
* Upgrade to the latest 3.4.x, certainly not an alpha version!
* Even better: switch to 3.6.x if you can
I'm closing this issue. We can always create an issue if a bug is found.
Cheers
Members leaving the cluster
---------------------------
Key: JGRP-2209
URL:
https://issues.jboss.org/browse/JGRP-2209
Project: JGroups
Issue Type: Bug
Affects Versions: 3.0
Environment: Linux
Reporter: Swathi Kumar
Assignee: Bela Ban
We recently upgraded the jgroups jars from version 2_5_2/jgroups-all.jar to
3_4_0/jgroups-3.4.0.Alpha2.jar.
With the upgrade we see our clusters are not stable.
The members leave the cluster for short duration of time (say around 5-6m) and join back
on their own.
We initially suspected it to be a network issue and we involved the network team to
investiate further.
But after reviewing the network logs, it is very much evident that the network has no
role to play in members leaving the cluster. The boxes on which the nodes/members are
running are healthy and fine and the network is very fast and healthy too.
We are not able to determine the root cause for the members leaving the clusters.
Please note, we have multiple clusters configured (round about 5-6) and we are
experiencing the same problem on all the clusters.
We request you to kindly help us in resolving this issue.
We have the below jgroups config properties in our application to create 3 channels (for
security reasons have used a dummy host name here) :-
jgroups_cluster.property_string=TCP(bind_addr=host_name_A;bind_port=34061):TCPPING(initial_hosts=host_name_A[34061],host_name_A[44061],host_name_A[54061];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
jgroups_cluster.distribution_property_string=TCP(bind_port=
34060;thread_pool_rejection_policy=run):TCPPING(initial_hosts=host_name_A[34060],host_name_A[44060],host_name_A[54060];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
jgroups_cluster.lock.protocolStack=TCP(bind_addr=host_name_A;bind_port=34062:TCPPING(initial_hosts=host_name_A[34062],host_name_A[44062],host_name_A[54062];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
Please let us know if you need any logs from our end.
Kindly consider this on priority as our business is at stake with this issue happening on
a daily basis.
Many thanks in advance.
Regards
Swathi BN
(IBM)