[jboss-jira] [JBoss JIRA] (JGRP-2209) Members leaving the cluster

Fri Aug 18 01:42:00 EDT 2017

     [ https://issues.jboss.org/browse/JGRP-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban closed JGRP-2209.
--------------------------
    Resolution: Out of Date

Hi Swathi,

as discussed on the call, the findings/suggestions were:
* UNICAST3 was missing from the config, use FD_ALL/FD_SOCK
* Old config: use the XML config {{tcp.xml}} shipped with the version of JGroups you use and make modifications to it
* Switch to XML-style config
* Upgrade to the latest 3.4.x, certainly not an alpha version!
* Even better: switch to 3.6.x if you can

I'm closing this issue. We can always create an issue if a bug is found.
Cheers

> Members leaving the cluster
> ---------------------------
>
>                 Key: JGRP-2209
>                 URL: https://issues.jboss.org/browse/JGRP-2209
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.0
>         Environment: Linux
>            Reporter: Swathi Kumar
>            Assignee: Bela Ban
>
> We recently upgraded the jgroups jars from version 2_5_2/jgroups-all.jar to 3_4_0/jgroups-3.4.0.Alpha2.jar.
> With the upgrade we see our clusters are not stable.
> The members leave the cluster for short duration of time (say around 5-6m) and join back on their own.
> We initially suspected it to be a network issue and we involved the network team to investiate further.
> But after reviewing the network logs, it is very much evident that the network has no role to play in members leaving the cluster. The boxes on which the nodes/members are running are healthy and fine and the network is very fast and healthy too.
> We are not able to determine the root cause for the members leaving the clusters.
> Please note, we have multiple clusters configured (round about 5-6) and we are experiencing the same problem on all the clusters.
> We request you to kindly help us in resolving this issue.
> We have the below jgroups config properties in our application to create 3 channels (for security reasons have used a dummy host name here) :-
> jgroups_cluster.property_string=TCP(bind_addr=host_name_A;bind_port=34061):TCPPING(initial_hosts=host_name_A[34061],host_name_A[44061],host_name_A[54061];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> jgroups_cluster.distribution_property_string=TCP(bind_port= 34060;thread_pool_rejection_policy=run):TCPPING(initial_hosts=host_name_A[34060],host_name_A[44060],host_name_A[54060];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> jgroups_cluster.lock.protocolStack=TCP(bind_addr=host_name_A;bind_port=34062:TCPPING(initial_hosts=host_name_A[34062],host_name_A[44062],host_name_A[54062];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> Please let us know if you need any logs from our end.
> Kindly consider this on priority as our business is at stake with this issue happening on a daily basis.
> Many thanks in advance.
> Regards
> Swathi BN
> (IBM)

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)