[jboss-jira] [JBoss JIRA] (JGRP-2206) Property strings are correct but JGROUPS is not recognizing other nodes
Bela Ban (JIRA)
issues at jboss.org
Mon Jul 24 15:41:00 EDT 2017
[ https://issues.jboss.org/browse/JGRP-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439493#comment-13439493 ]
Bela Ban commented on JGRP-2206:
--------------------------------
This is your problem: {{ Setting jgroups.bind_addr = localhost}}. If you grep for "physical address", you'll get
{noformat}
[belasmac] /Users/bela/Downloads$ grep physical node*
node1.noapp.log.D20170724.T215702:GMS: address=P02HBNFW9657-64048, cluster=Sterling_NodeInfo_group, physical address=127.0.0.1:5061
node1.noapp.log.D20170724.T215702:GMS: address=P02HBNFW9657-2450, cluster=Sterling_NodeInfo_group_WFC, physical address=127.0.0.1:5060
node2.noapp.log.D20170724.T215702:GMS: address=P02HBNFW9657-64048, cluster=Sterling_NodeInfo_group, physical address=127.0.0.1:5061
node2.noapp.log.D20170724.T215702:GMS: address=P02HBNFW9657-2450, cluster=Sterling_NodeInfo_group_WFC, physical address=127.0.0.1:5060
node3.noapp.log.D20170724.T220020:GMS: address=P02HBNFW6872-9702, cluster=Sterling_NodeInfo_group, physical address=127.0.0.1:5061
node3.noapp.log.D20170724.T220020:GMS: address=P02HBNFW6872-37964, cluster=Sterling_NodeInfo_group_WFC, physical address=127.0.0.1:5060
node4.noapp.log.D20170724.T215955:GMS: address=P02HBNFW9137-63139, cluster=Sterling_NodeInfo_group, physical address=127.0.0.1:5061
node4.noapp.log.D20170724.T215955:GMS: address=P02HBNFW9137-4273, cluster=Sterling_NodeInfo_group_WFC, physical address=127.0.0.1:5060
{noformat}
A you can see, members bind to {{localhost}}, which is {{127.0.0.1}}, so they can't communicate.
You need to set {{jgroups.bind_addr}} to a routable IP address.
Note that your config is missing {{UNICAST2}}, that will cause lossy point-to-point communication.
> Property strings are correct but JGROUPS is not recognizing other nodes
> -----------------------------------------------------------------------
>
> Key: JGRP-2206
> URL: https://issues.jboss.org/browse/JGRP-2206
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.4
> Environment: With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options
> OS: Windows Server 2008 R2 6.1,amd64
> Java version: 1.7.0,pwa6470sr9fp10-20150708_01 (SR9 FP10),IBM Corporation
> Reporter: Swathi Kumar
> Assignee: Bela Ban
> Priority: Blocker
> Attachments: VisibilityIssue.zip
>
>
> Our customer has a four node cluster which we believe is correctly defined yet the nodes are not communicating with each other.
> All nodes are on VMWare. None of the hostnames are virtual (in that they are all directly attached to an IP and are not managed by load balancers, etc).
>
> The nodes are located in separate data centers (2 in each) and jgroups is operating over tcp, rather than udp multicast.
> NOTE: The issue occurs only in the customer's environment (we are not able to reproduce this issue in our lab).
> We are attaching our logs (noapp.log.<timestamp>) with JGROUPS debugging enabled.
> *Node1 Property strings*:
> [2017-07-24 21:58:30.867] ALL 000000000000 GLOBAL_SCOPE Initializing jgroups_cluster.property_string. Receivied this property: TCP(bind_addr=10.38.46.27;bind_port=5061;level=ERROR):TCPPING(initial_hosts=10.38.46.27[5061],10.38.46.28[5061],10.38.175.30[5061],10.38.175.32[5061];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=110;):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> [2017-07-24 21:58:30.867] ALL 000000000000 GLOBAL_SCOPE Done initializing jgroups_cluster.property_string. Using this property: TCP(bind_addr=10.38.46.27;bind_port=5061;level=ERROR):TCPPING(initial_hosts=10.38.46.27[5061],10.38.46.28[5061],10.38.175.30[5061],10.38.175.32[5061];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=110):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> [2017-07-24 21:58:30.867] ALL 000000000000 GLOBAL_SCOPE Initializing jgroups_cluster.distributed_property_string. Receivied this property: TCP(bind_port=5060;thread_pool_rejection_policy=run;level=ERROR):TCPPING(initial_hosts=10.38.46.27[5060],10.38.46.28[5060],10.38.175.30[5060],10.38.175.32[5060];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48;):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> [2017-07-24 21:58:30.867] ALL 000000000000 GLOBAL_SCOPE Done initializing jgroups_cluster.distributed_property_string. Using this property: TCP(bind_port=5060;thread_pool_rejection_policy=run;level=ERROR):TCPPING(initial_hosts=10.38.46.27[5060],10.38.46.28[5060],10.38.175.30[5060],10.38.175.32[5060];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> *Node2 Property strings*:
> [2017-07-24 22:01:01.666] ALL 000000000000 GLOBAL_SCOPE Initializing jgroups_cluster.property_string. Receivied this property: TCP(bind_addr=10.38.46.28;bind_port=5061;level=ERROR):TCPPING(initial_hosts=10.38.46.28[5061],10.38.46.27[5061],10.38.175.30[5061],10.38.175.32[5061];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=110;):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> [2017-07-24 22:01:01.666] ALL 000000000000 GLOBAL_SCOPE Done initializing jgroups_cluster.property_string. Using this property: TCP(bind_addr=10.38.46.28;bind_port=5061;level=ERROR):TCPPING(initial_hosts=10.38.46.28[5061],10.38.46.27[5061],10.38.175.30[5061],10.38.175.32[5061];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=110):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> [2017-07-24 22:01:01.666] ALL 000000000000 GLOBAL_SCOPE Initializing jgroups_cluster.distributed_property_string. Receivied this property: TCP(bind_port=5060;thread_pool_rejection_policy=run;level=ERROR):TCPPING(initial_hosts=10.38.46.28[5060],10.38.46.27[5060],10.38.175.30[5060],10.38.175.32[5060];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48;):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> [2017-07-24 22:01:01.666] ALL 000000000000 GLOBAL_SCOPE Done initializing jgroups_cluster.distributed_property_string. Using this property: TCP(bind_port=5060;thread_pool_rejection_policy=run;level=ERROR):TCPPING(initial_hosts=10.38.46.28[5060],10.38.46.27[5060],10.38.175.30[5060],10.38.175.32[5060];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> *Node3 Property strings*:
> [2017-07-24 22:02:01.411] ALL 000000000000 GLOBAL_SCOPE Initializing jgroups_cluster.property_string. Receivied this property: TCP(bind_addr=10.38.175.30;bind_port=5061;level=ERROR):TCPPING(initial_hosts=10.38.175.30[5061],10.38.46.27[5061],10.38.46.28[5061],10.38.175.32[5061];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=110;):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> [2017-07-24 22:02:01.411] ALL 000000000000 GLOBAL_SCOPE Done initializing jgroups_cluster.property_string. Using this property: TCP(bind_addr=10.38.175.30;bind_port=5061;level=ERROR):TCPPING(initial_hosts=10.38.175.30[5061],10.38.46.27[5061],10.38.46.28[5061],10.38.175.32[5061];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=110):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> [2017-07-24 22:02:01.411] ALL 000000000000 GLOBAL_SCOPE Initializing jgroups_cluster.distributed_property_string. Receivied this property: TCP(bind_port=5060;thread_pool_rejection_policy=run;level=ERROR):TCPPING(initial_hosts=10.38.175.30[5060],10.38.46.27[5060],10.38.46.28[5060],10.38.175.32[5060];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48;):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> [2017-07-24 22:02:01.411] ALL 000000000000 GLOBAL_SCOPE Done initializing jgroups_cluster.distributed_property_string. Using this property: TCP(bind_port=5060;thread_pool_rejection_policy=run;level=ERROR):TCPPING(initial_hosts=10.38.175.30[5060],10.38.46.27[5060],10.38.46.28[5060],10.38.175.32[5060];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> *Node4 Property strings*:
> [2017-07-24 22:01:14.365] ALL 000000000000 GLOBAL_SCOPE Initializing jgroups_cluster.property_string. Receivied this property: TCP(bind_addr=10.38.175.32;bind_port=5061;level=ERROR):TCPPING(initial_hosts=10.38.175.32[5061],10.38.46.27[5061],10.38.46.28[5061],10.38.175.30[5061];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=110;):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> [2017-07-24 22:01:14.365] ALL 000000000000 GLOBAL_SCOPE Done initializing jgroups_cluster.property_string. Using this property: TCP(bind_addr=10.38.175.32;bind_port=5061;level=ERROR):TCPPING(initial_hosts=10.38.175.32[5061],10.38.46.27[5061],10.38.46.28[5061],10.38.175.30[5061];port_range=0;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_ALL(interval=5000;timeout=20000):FD(timeout=5000;max_tries=110):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=100,200,300,600,1200,2400,4800;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(print_local_addr=true;join_timeout=5000)
> [2017-07-24 22:01:14.365] ALL 000000000000 GLOBAL_SCOPE Initializing jgroups_cluster.distributed_property_string. Receivied this property: TCP(bind_port=5060;thread_pool_rejection_policy=run;level=ERROR):TCPPING(initial_hosts=10.38.175.32[5060],10.38.46.27[5060],10.38.46.28[5060],10.38.175.30[5060];port_range=1;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48;):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
> [2017-07-24 22:01:14.365] ALL 000000000000 GLOBAL_SCOPE Done initializing jgroups_cluster.distributed_property_string. Using this property: TCP(bind_port=5060;thread_pool_rejection_policy=run;level=ERROR):TCPPING(initial_hosts=10.38.175.32[5060],10.38.46.27[5060],10.38.46.28[5060],10.38.175.30[5060];port_range=1;timeout=5000;num_initial_members=4):MERGE2(min_interval=3000;max_interval=5000):FD_SOCK:FD(timeout=5000;max_tries=48):VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK(retransmit_timeout=3000;discard_delivered_msgs=true):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=20000;max_bytes=0):pbcast.GMS(join_timeout=5000;print_local_addr=true)
>
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
More information about the jboss-jira
mailing list