[jboss-jira] [JBoss JIRA] (JGRP-2195) [JGRP00012] discarded message from different cluster with JGroups Upgrade

Thu Jun 15 01:54:00 EDT 2017

    [ https://issues.jboss.org/browse/JGRP-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421507#comment-13421507 ] 

Bela Ban commented on JGRP-2195:
--------------------------------

Why didn't you upgrade to a more recent version of JGroups, e.g. 4.0 or at least 3.6.x?

Anyway, you should give more space between ports of different clusters: bind_ports of 34060, 34061 and 34062 with a port_range of 1 are likely to overlap. Below's a config that's better:

ClusterA:
TCP bind_port=30000, 30001, 30002 // 3 members
TCPPING: initial_hosts=xxx\[30000\] port_range=2

ClusterB:
TCP bind_port=40000, 40001, 40002 // 3 members
TCPPING: initial_hosts=xxx\[40000\] port_range=2

ClusterC:
TCP bind_port=50000, 50001, 50002 // 3 members
TCPPING: initial_hosts=xxx\[50000\] port_range=2

> [JGRP00012] discarded message from different cluster with JGroups Upgrade
> -------------------------------------------------------------------------
>
>                 Key: JGRP-2195
>                 URL: https://issues.jboss.org/browse/JGRP-2195
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.4
>         Environment: All OS(Linux, AIX, Windows, Solaris)
>            Reporter: Swathi Kumar
>            Assignee: Bela Ban
>
> Greetings Team.
> We recently upgraded the jgroups jars from version 2_5_2/jgroups-all.jar to 3_4_0/jgroups-3.4.0.Alpha2.jar.
> With the upgrade we are seeing *[JGRP00012] discarded message from different cluster* messages every alternate seconds on all the nodes in the cluster.
> Also it is to be noted that this issue started to recur only when we switched the protocol from UDP to TCP. If we start using UDP again, we no longer see these *WARN* messages.
> We no longer support UDP in our application and we can't be using UDP anymore.
> We have several 100's of customers in the field who are using our product with this upgraded jgroups jar and have started to raise tickets against our product.
> We are clueless as to why the upgrade is producing enormous WARN messages - is there an issue with this version of the jgroups jar?
> The sample WARN message is shown below :-
> [2017-06-13 11:56:38.117] ALL 000000000000 GLOBAL_SCOPE 141694 [OOB-1,Sterling_NodeInfo_group,dublr005vm-24633] WARN org.jgroups.protocols.TCP  - [JGRP00012] discarded message from different cluster Sterling_NodeInfo_group_WFC (our cluster is Sterling_NodeInfo_group). Sender was dublr005vm-2060
> [2017-06-13 11:56:41.72] ALL 000000000000 GLOBAL_SCOPE 145297 [OOB-1,Sterling_NodeInfo_group_WFC,dublr005vm-2060] WARN org.jgroups.protocols.TCP  - [JGRP00012] discarded message from different cluster Sterling_NodeInfo_group (our cluster is Sterling_NodeInfo_group_WFC). Sender was dublr005vm-24633
> We have the below jgroups config properties in our application to create 3 channels (for security reasons have used a dummy host name here) :-
> jgroups_cluster.property_string=TCP(bind_addr=host_name_A;bind_port=34061):TCPPING(initial_hosts=host_name_A[34061],host_name_A[44061],host_name_A[54061];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> jgroups_cluster.distribution_property_string=TCP(bind_port= 34060;thread_pool_rejection_policy=run):TCPPING(initial_hosts=host_name_A[34060],host_name_A[44060],host_name_A[54060];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> jgroups_cluster.lock.protocolStack=TCP(bind_addr=host_name_A;bind_port=34062;):TCPPING(initial_hosts=host_name_A[34062],host_name_A[44062],host_name_A[54062];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> Test considerations :-
> 1. For in-house testing, I have created a 3 node cluster.
> 2. All the 3 nodes reside on the same box.
> If you need any further information please let me know.
> Regards
> Swathi BN

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)