[jboss-dev-forums] [Design of Clustering on JBoss (Clusters/JBoss)] - Re: JBPAPP-863 -- FC blocks during slow failure detection

Mon Jul 7 18:57:55 EDT 2008

Vladimir,

Below is what I think he's using as the config (the std commented out TCP config in the jboss-web-cluster sar). Dominik, please post your config if this isn't it.

<TCP bind_addr="thishost" start_port="7810" loopback="true"
  |                      tcp_nodelay="true"
  | 	                 recv_buf_size="20000000"
  | 	                 send_buf_size="640000"
  | 	                 discard_incompatible_packets="true"
  | 	                 enable_bundling="true"
  | 	                 max_bundle_size="64000"
  | 	                 max_bundle_timeout="30"
  | 	                 use_incoming_packet_handler="true"
  | 	                 use_outgoing_packet_handler="false"
  | 	                 down_thread="false" up_thread="false"
  | 	                 use_send_queues="false"
  | 	                 sock_conn_timeout="300"
  | 	                 skip_suspected_members="true"/>
  | 	            <TCPPING initial_hosts="thishost[7810],otherhost[7810]" port_range="3"
  | 	                     timeout="3000"
  | 	                     down_thread="false" up_thread="false"
  | 	                     num_initial_members="3"/>
  | 	            <MERGE2 max_interval="100000"
  | 	                    down_thread="false" up_thread="false" min_interval="20000"/>
  | 	            <FD_SOCK down_thread="false" up_thread="false"/>
  | 	            <FD timeout="10000" max_tries="5" down_thread="false" up_thread="false" shun="true"/>
  | 	            <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
  | 	            <pbcast.NAKACK max_xmit_size="60000"
  | 	                           use_mcast_xmit="false" gc_lag="0"
  | 	                           retransmit_timeout="300,600,1200,2400,4800"
  | 	                           down_thread="false" up_thread="false"
  | 	                           discard_delivered_msgs="true"/>
  | 	            <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
  | 	                           down_thread="false" up_thread="false"
  | 	                           max_bytes="400000"/>
  | 	            <pbcast.GMS print_local_addr="true" join_timeout="3000"
  | 	                        down_thread="false" up_thread="false"
  | 	                        join_retry_timeout="2000" shun="true"
  | 	                        view_bundling="true"/>
  | 	            <FC max_credits="2000000" down_thread="false" up_thread="false"
  | 	                min_threshold="0.10"/>
  | 	            <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/>
  | 	            <pbcast.STATE_TRANSFER down_thread="false" up_thread="false" use_flush="false"/>

There's a stack3.txt attached to the JIRA which doesn't show any problems with any of the JGroups threads; e.g. a blocked IncomingPacketHandler that would prevent a view message progagating.  But that stack trace doesn't come from the same time as the logs; not sure when it was taken.  It might have been taken before the node failure should have been detected.

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4162968#4162968

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4162968