[jboss-jira] [JBoss JIRA] Commented: (JGRP-1233) TCPPING fails to connect with different server because sender thread is interrupted too soon?

Thu Sep 16 10:18:28 EDT 2010

    [ https://jira.jboss.org/browse/JGRP-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12551290#action_12551290 ] 

Karthik Abram commented on JGRP-1233:
-------------------------------------

JGroups configuration:

<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-2.8.xsd">
        <!-- TCP: Core transport protocol. This specifies TCP has the protocol for actual messaging.
             Each member will attempt to bind to port 9800 on all network interfaces and open a
             listening socket on that port. If the port is already in use, it will skip ahead to
             a max of 5. -->
    <TCP bind_addr="0.0.0.0" bind_port="9043" loopback="true"
         recv_buf_size="${tcp.recv_buf_size:20M}"
         send_buf_size="${tcp.send_buf_size:640K}"
         discard_incompatible_packets="true"
         max_bundle_size="64K"
         max_bundle_timeout="30"
         enable_bundling="true"
         use_send_queues="true"
         sock_conn_timeout="300"
         timer.num_threads="4"

         thread_pool.enabled="true"
         thread_pool.min_threads="1"
         thread_pool.max_threads="10"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="false"
         thread_pool.queue_max_size="100"
         thread_pool.rejection_policy="discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="discard" />
         <!-- TCPPING: This is the discovery protocol and is setup to use a list of well-known hosts
              at least one of which is guaranteed to alive to establish the initial network. For
              ReportSource, since the critical pieces are the WebSphere columns, we will list each
              of the WebSphere columns here with their default bind_port address. In the case
              where no WebSphere node is available, the batch processes will simply create a cluster
              that will join the WebSphere node when available (via MERGE2 protocol below). -->
    <TCPPING timeout="3000"
             initial_hosts="10.0.7.28[9043]"
             port_range="1"
             num_initial_members="1"/>
    	<!-- MERGE2: Merging of members together when conditions cause a split-cluster. -->
    <MERGE2 max_interval="10000"
            min_interval="5000"/>
        <!-- Failure detection protocol using socket status -->
    <FD_SOCK/>
        <!-- Failure detection protocol using "ping" message -->
    <FD timeout="3000"
        max_tries="3"/>
        <!-- Suspect (i.e., member is suspected to have left abruptly) verification -->
    <VERIFY_SUSPECT timeout="1500"/>
        <!-- These are for reliable communication (NAKACK), Group membership services (GMS), etc. -->
    <pbcast.NAKACK  use_mcast_xmit="false" gc_lag="50"
                   retransmit_timeout="600,1200,2400,4800"
                   discard_delivered_msgs="true" />
    <UNICAST timeout="300,600,1200" />
    <pbcast.STABLE stability_delay="1000"
                   desired_avg_gossip="20000"
                   max_bytes="10K"/>
	<pbcast.GMS print_local_addr="true" join_timeout="3000" />
 	<FC max_credits="2M"
        min_threshold="0.10"/>
    <FRAG2 frag_size="60K" />
    <pbcast.STREAMING_STATE_TRANSFER/>

<!--    <AUTH auth_class="org.jgroups.auth.MD5Token"-->
<!--        auth_value="testme"-->
<!--        token_hash="MD5"/>-->
</config>

> TCPPING fails to connect with different server because sender thread is interrupted too soon?
> ---------------------------------------------------------------------------------------------
>
>                 Key: JGRP-1233
>                 URL: https://jira.jboss.org/browse/JGRP-1233
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.10
>            Reporter: Karthik Abram
>            Assignee: Bela Ban
>
> The configuration file I'm using is in the comment below. The log output is below that. The system works just fine if I run two instances of the program on the same machine. However, when I run it on different machines (only 1 port is open between them), I see connection-established, and then I get this in the log:
> TRACE TCPConnectionMap$TCPConnection - TCPConnection.Sender thread terminated at 0.0.0.0:9043
> Looking at the code, it seems the sender thread gets interrupted before it gets data from the other program?
> We cannot using MPING for discovery in our environment and this is a major issue for us. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira