[jboss-jira] [JBoss JIRA] Created: (JGRP-364) When using TCP_NIO, starting two nodes at the same time causes one of the nodes not to join group

Tue Nov 21 03:52:41 EST 2006

When using TCP_NIO, starting two nodes at the same time causes one of the nodes not to join group
-------------------------------------------------------------------------------------------------

                 Key: JGRP-364
                 URL: http://jira.jboss.com/jira/browse/JGRP-364
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 2.4
         Environment: linux 2.6 kernel x86_64 running java 1.5.0_06
            Reporter: Matthew Todd
         Assigned To: Bela Ban

I am testing a jgroups tcp_nio configuration using the draw demo.If I start up my 3 nodes one by one then everything works fine. However if I start up node 1, then attempt to start node 2 and 3 in parallel then only node 2 will work. Node 3 will be isolated and not see the other nodes and logs the following message: 

org.jgroups.protocols.pbcast.ClientGmsImpl join
WARNING: join(192.158.70.200:7802) sent to 192.158.70.200:7800 timed out, retrying

I am starting the draw demo like this;

java -cp jgroups-all.jar:commons-logging.jar:concurrent.jar:jmxri.jar  org.jgroups.demos.Draw -props test.xml

Here is the configuration for one of my nodes:

<config>
           <TCP_NIO
            bind_addr="192.158.70.200"
            recv_buf_size="20000000"
            send_buf_size="640000"
            loopback="false"
            discard_incompatible_packets="true"
            max_bundle_size="64000"
            max_bundle_timeout="30"
            use_incoming_packet_handler="true"
            use_outgoing_packet_handler="true"
            down_thread="false" up_thread="false"
            enable_bundling="true"
            start_port="7800"
            end_port="7800"
            use_send_queues="false"
            sock_conn_timeout="300" skip_suspected_members="true"

            />

 <MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8"

bind_addr="192.158.70.200" down_thread="false" up_thread="false"/>

   <MERGE2 max_interval="100000"
            down_thread="false" up_thread="false" min_interval="20000"/>
    <FD_SOCK down_thread="false" up_thread="false"/>

    <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
    <pbcast.NAKACK max_xmit_size="60000"
                   use_mcast_xmit="false" gc_lag="0"
                   retransmit_timeout="300,600,1200,2400,4800"
                   down_thread="true" up_thread="true"
                   discard_delivered_msgs="true"/>
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   down_thread="false" up_thread="false"
                   max_bytes="400000"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"
                down_thread="true" up_thread="true"
                join_retry_timeout="2000" shun="true"
                view_bundling="true"/>
    <!-- <FC max_credits="2000000" down_thread="false" up_thread="false"
        min_threshold="0.10"/>
    <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/> -->
<pbcast.STATE_TRANSFER/>
<!--    <pbcast.FLUSH down_thread="false" up_thread="false"/>-->
</config>

Node 2 and 3 have the same configuration except the port they bind to has been changed

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira