]
Bela Ban resolved JGRP-364.
---------------------------
Resolution: Cannot Reproduce Bug
Works perfectly for me. Couldn not reproduce this on Fedora 7 2.6.22.6 kernel
When using TCP_NIO, starting two nodes at the same time causes one of
the nodes not to join group
-------------------------------------------------------------------------------------------------
Key: JGRP-364
URL:
http://jira.jboss.com/jira/browse/JGRP-364
Project: JGroups
Issue Type: Bug
Affects Versions: 2.4
Environment: linux 2.6 kernel x86_64 running java 1.5.0_06
Reporter: Matthew Todd
Assigned To: Scott Marlow
Fix For: 2.6
Attachments: test.xml, test1.bat, test2.bat, test2.xml, test3.bat, test3.xml
I am testing a jgroups tcp_nio configuration using the draw demo.If I start up my 3 nodes
one by one then everything works fine. However if I start up node 1, then attempt to start
node 2 and 3 in parallel then only node 2 will work. Node 3 will be isolated and not see
the other nodes and logs the following message:
org.jgroups.protocols.pbcast.ClientGmsImpl join
WARNING: join(192.158.70.200:7802) sent to 192.158.70.200:7800 timed out, retrying
I am starting the draw demo like this;
java -cp jgroups-all.jar:commons-logging.jar:concurrent.jar:jmxri.jar
org.jgroups.demos.Draw -props test.xml
Here is the configuration for one of my nodes:
<config>
<TCP_NIO
bind_addr="192.158.70.200"
recv_buf_size="20000000"
send_buf_size="640000"
loopback="false"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
use_incoming_packet_handler="true"
use_outgoing_packet_handler="true"
down_thread="false" up_thread="false"
enable_bundling="true"
start_port="7800"
end_port="7800"
use_send_queues="false"
sock_conn_timeout="300" skip_suspected_members="true"
/>
<MPING timeout="2000" num_initial_members="3"
mcast_addr="229.6.7.8"
bind_addr="192.158.70.200" down_thread="false"
up_thread="false"/>
<MERGE2 max_interval="100000"
down_thread="false" up_thread="false"
min_interval="20000"/>
<FD_SOCK down_thread="false" up_thread="false"/>
<VERIFY_SUSPECT timeout="1500" down_thread="false"
up_thread="false"/>
<pbcast.NAKACK max_xmit_size="60000"
use_mcast_xmit="false" gc_lag="0"
retransmit_timeout="300,600,1200,2400,4800"
down_thread="true" up_thread="true"
discard_delivered_msgs="true"/>
<pbcast.STABLE stability_delay="1000"
desired_avg_gossip="50000"
down_thread="false" up_thread="false"
max_bytes="400000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
down_thread="true" up_thread="true"
join_retry_timeout="2000" shun="true"
view_bundling="true"/>
<!-- <FC max_credits="2000000" down_thread="false"
up_thread="false"
min_threshold="0.10"/>
<FRAG2 frag_size="60000" down_thread="false"
up_thread="false"/> -->
<pbcast.STATE_TRANSFER/>
<!-- <pbcast.FLUSH down_thread="false"
up_thread="false"/>-->
</config>
Node 2 and 3 have the same configuration except the port they bind to has been changed
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: