[jboss-jira] [JBoss JIRA] (JGRP-2209) Members leaving the cluster

Tue Aug 8 03:38:00 EDT 2017

    [ https://issues.jboss.org/browse/JGRP-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445443#comment-13445443 ] 

Bela Ban commented on JGRP-2209:
--------------------------------

Hi Swathi,

please don't use JIRA to report problems; this is for real bugs or new features...
Having said that, I have a few questions:
* Why didn't you upgrade to 3.6.x (the latest?) Why to an *alpha* version?
* Is this reproduceable?
* You have the weird combination of FD_ALL and FD; this makes no sense! I suggest FD_ALL and FD_SOCK instead
* Why did you remove UNICAST3? This will give you dropped / duplicate or unordered point-to-point messages

If you want to, we can have an interactive (google hangout with screen sharing) session to get you back on track. Ping me at belaban at mailbox dot org if you're interested.

> Members leaving the cluster
> ---------------------------
>
>                 Key: JGRP-2209
>                 URL: https://issues.jboss.org/browse/JGRP-2209
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.0
>         Environment: Linux
>            Reporter: Swathi Kumar
>            Assignee: Bela Ban
>
> We recently upgraded the jgroups jars from version 2_5_2/jgroups-all.jar to 3_4_0/jgroups-3.4.0.Alpha2.jar.
> With the upgrade we see our clusters are not stable.
> The members leave the cluster for short duration of time (say around 5-6m) and join back on their own.
> We initially suspected it to be a network issue and we involved the network team to investiate further.
> But after reviewing the network logs, it is very much evident that the network has no role to play in members leaving the cluster. The boxes on which the nodes/members are running are healthy and fine and the network is very fast and healthy too.
> We are not able to determine the root cause for the members leaving the clusters.
> Please note, we have multiple clusters configured (round about 5-6) and we are experiencing the same problem on all the clusters.
> We request you to kindly help us in resolving this issue.
> We have the below jgroups config properties in our application to create 3 channels (for security reasons have used a dummy host name here) :-
> jgroups_cluster.property_string=TCP(bind_addr=host_name_A;bind_port=34061):TCPPING(initial_hosts=host_name_A[34061],host_name_A[44061],host_name_A[54061];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> jgroups_cluster.distribution_property_string=TCP(bind_port= 34060;thread_pool_rejection_policy=run):TCPPING(initial_hosts=host_name_A[34060],host_name_A[44060],host_name_A[54060];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> jgroups_cluster.lock.protocolStack=TCP(bind_addr=host_name_A;bind_port=34062:TCPPING(initial_hosts=host_name_A[34062],host_name_A[44062],host_name_A[54062];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> Please let us know if you need any logs from our end.
> Kindly consider this on priority as our business is at stake with this issue happening on a daily basis.
> Many thanks in advance.
> Regards
> Swathi BN
> (IBM)

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)