]
Bela Ban commented on JGRP-2195:
--------------------------------
Why didn't you upgrade to a more recent version of JGroups, e.g. 4.0 or at least
3.6.x?
Anyway, you should give more space between ports of different clusters: bind_ports of
34060, 34061 and 34062 with a port_range of 1 are likely to overlap. Below's a config
that's better:
ClusterA:
TCP bind_port=30000, 30001, 30002 // 3 members
TCPPING: initial_hosts=xxx\[30000\] port_range=2
ClusterB:
TCP bind_port=40000, 40001, 40002 // 3 members
TCPPING: initial_hosts=xxx\[40000\] port_range=2
ClusterC:
TCP bind_port=50000, 50001, 50002 // 3 members
TCPPING: initial_hosts=xxx\[50000\] port_range=2
[JGRP00012] discarded message from different cluster with JGroups
Upgrade
-------------------------------------------------------------------------
Key: JGRP-2195
URL:
https://issues.jboss.org/browse/JGRP-2195
Project: JGroups
Issue Type: Bug
Affects Versions: 3.4
Environment: All OS(Linux, AIX, Windows, Solaris)
Reporter: Swathi Kumar
Assignee: Bela Ban
Greetings Team.
We recently upgraded the jgroups jars from version 2_5_2/jgroups-all.jar to
3_4_0/jgroups-3.4.0.Alpha2.jar.
With the upgrade we are seeing *[JGRP00012] discarded message from different cluster*
messages every alternate seconds on all the nodes in the cluster.
Also it is to be noted that this issue started to recur only when we switched the
protocol from UDP to TCP. If we start using UDP again, we no longer see these *WARN*
messages.
We no longer support UDP in our application and we can't be using UDP anymore.
We have several 100's of customers in the field who are using our product with this
upgraded jgroups jar and have started to raise tickets against our product.
We are clueless as to why the upgrade is producing enormous WARN messages - is there an
issue with this version of the jgroups jar?
The sample WARN message is shown below :-
[2017-06-13 11:56:38.117] ALL 000000000000 GLOBAL_SCOPE 141694
[OOB-1,Sterling_NodeInfo_group,dublr005vm-24633] WARN org.jgroups.protocols.TCP -
[JGRP00012] discarded message from different cluster Sterling_NodeInfo_group_WFC (our
cluster is Sterling_NodeInfo_group). Sender was dublr005vm-2060
[2017-06-13 11:56:41.72] ALL 000000000000 GLOBAL_SCOPE 145297
[OOB-1,Sterling_NodeInfo_group_WFC,dublr005vm-2060] WARN org.jgroups.protocols.TCP -
[JGRP00012] discarded message from different cluster Sterling_NodeInfo_group (our cluster
is Sterling_NodeInfo_group_WFC). Sender was dublr005vm-24633
We have the below jgroups config properties in our application to create 3 channels (for
security reasons have used a dummy host name here) :-
jgroups_cluster.property_string=TCP(bind_addr=host_name_A;bind_port=34061):TCPPING(initial_hosts=host_name_A[34061],host_name_A[44061],host_name_A[54061];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
jgroups_cluster.distribution_property_string=TCP(bind_port=
34060;thread_pool_rejection_policy=run):TCPPING(initial_hosts=host_name_A[34060],host_name_A[44060],host_name_A[54060];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
jgroups_cluster.lock.protocolStack=TCP(bind_addr=host_name_A;bind_port=34062;):TCPPING(initial_hosts=host_name_A[34062],host_name_A[44062],host_name_A[54062];port_range=1;timeout=5000;num_initial_members=2):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
Test considerations :-
1. For in-house testing, I have created a 3 node cluster.
2. All the 3 nodes reside on the same box.
If you need any further information please let me know.
Regards
Swathi BN