[jboss-user] [Clustering/JBoss] - Initial state transfer failed: Channel.getState() returned f

Thu Sep 25 20:28:42 EDT 2008

We have a cluster with two nodes (JBOSS4.0.5 CP10) deployed in Solaris 10 with JRE150_10 Sparc edition. We found the following error when the second node being started up. Below is the console msg we found in the 2nd node. We are puzzle why the cluster initial can form properly, however when the 2nd node tryng to get the  state from the Main node, it failed to do that and leave the group "StgGtasPartition4"

  | 2008-09-25 20:10:55,970 INFO  [FRAG2] frag_size=60000, overhead=0, new frag_size=60000
  | 2008-09-25 20:10:56,121 INFO  [UDP] unicast sockets will use interface 150.105.49.84
  | 2008-09-25 20:10:56,129 INFO  [UDP] socket information:
  | local_addr=lsgridtalkp02:56637, mcast_addr=228.1.3.88:45577, bind_addr=/150.105.49.84, ttl=2
  | sock: bound to 150.105.49.84:56637, receive buffer size=2097152, send buffer size=640000
  | mcast_recv_sock: bound to 150.105.49.84:45577, send buffer size=640000, receive buffer size=2097152
  | mcast_send_sock: bound to 150.105.49.84:56638, send buffer size=640000, receive buffer size=2097152
  | 2008-09-25 20:10:56,130 INFO  [STDOUT] 
  | -------------------------------------------------------
  | GMS: address is lsgridtalkp02:56637
  | -------------------------------------------------------
  | 2008-09-25 20:10:58,447 INFO  [TreeCache] viewAccepted(): [lsgridtalkp01:54552|3] [lsgridtalkp01:54552, lsgridtalkp02:56637]
  | 2008-09-25 20:10:58,451 INFO  [TreeCache] TreeCache local address is lsgridtalkp02:56637
  | 2008-09-25 20:11:06,647 INFO  [TreeCache] received the state (size=1024 bytes)
  | 2008-09-25 20:11:06,708 INFO  [TreeCache] state was retrieved successfully (in 8255 milliseconds)
  | 2008-09-25 20:11:06,709 INFO  [TreeCache] parseConfig(): PojoCacheConfig is empty
  | 2008-09-25 20:11:07,815 INFO  [StgGtasPartition4] Initializing
  | 2008-09-25 20:11:07,986 INFO  [UDP] unicast sockets will use interface 150.105.49.84
  | 2008-09-25 20:11:07,990 INFO  [UDP] socket information:
  | local_addr=lsgridtalkp02:56651 (additional data: 19 bytes), mcast_addr=228.1.3.88:45566, bind_addr=/150.105.49.84, ttl=8
  | sock: bound to 150.105.49.84:56651, receive buffer size=2000000, send buffer size=640000
  | mcast_recv_sock: bound to 150.105.49.84:45566, send buffer size=640000, receive buffer size=2000000
  | mcast_send_sock: bound to 150.105.49.84:56652, send buffer size=640000, receive buffer size=2000000
  | 2008-09-25 20:11:07,992 INFO  [STDOUT] 
  | -------------------------------------------------------
  | GMS: address is lsgridtalkp02:56651 (additional data: 19 bytes)
  | -------------------------------------------------------
  | 2008-09-25 20:11:10,493 INFO  [StgGtasPartition4] Number of cluster members: 2
  | 2008-09-25 20:11:10,494 INFO  [StgGtasPartition4] New cluster view for partition StgGtasPartition4: 3 ([150.105.49.79:31099, 150.105.49.84:31099] delta: 0)
  | 2008-09-25 20:11:10,494 INFO  [StgGtasPartition4] Other members: 1
  | 2008-09-25 20:11:10,495 INFO  [StgGtasPartition4] Fetching state (will wait for 30000 milliseconds):
  | 2008-09-25 20:11:56,732 WARN  [ServiceController] Problem starting service jboss:service=StgGtasPartition4
  | java.lang.IllegalStateException: Initial state transfer failed: Channel.getState() returned false
  | 	at org.jboss.ha.framework.server.HAPartitionImpl.fetchState(HAPartitionImpl.java:351)
  | 	at org.jboss.ha.framework.server.HAPartitionImpl.startPartition(HAPartitionImpl.java:280)
  | 	at org.jboss.ha.framework.server.ClusterPartition.startService(ClusterPartition.java:341)
  | 

We have tried to increase the time for waiting to 60s, the same error still occured. The following show the "cluster-service.xml"

  | <mbean code="org.jboss.ha.framework.server.ClusterPartition"
  |       name="jboss:service=${jboss.partition.name:DefaultPartition}">
  |          
  |       <!-- Name of the partition being built -->
  |       <attribute name="PartitionName">${jboss.partition.name:DefaultPartition}</attribute>
  | 
  |       <!-- The address used to determine the node name -->
  |       <attribute name="NodeAddress">${jboss.bind.address}</attribute>
  | 
  |       <!-- Determine if deadlock detection is enabled -->
  |       <attribute name="DeadlockDetection">False</attribute>
  |      
  |       <!-- Max time (in ms) to wait for state transfer to complete. Increase for large states -->
  |       <attribute name="StateTransferTimeout">30000</attribute>
  | 
  |       <!-- The JGroups protocol configuration -->
  |       <attribute name="PartitionConfig">
  |          <!--
  |          The default UDP stack:
  |          - If you have a multihomed machine, set the UDP protocol's bind_addr attribute to the
  |          appropriate NIC IP address, e.g bind_addr="192.168.0.2".
  |          - On Windows machines, because of the media sense feature being broken with multicast
  |          (even after disabling media sense) set the UDP protocol's loopback attribute to true
  |          -->
  |          <Config>
  |             <UDP mcast_addr="${jboss.partition.udpGroup:228.1.2.3}" mcast_port="45566"
  |                ip_ttl="${jgroups.mcast.ip_ttl:8}" ip_mcast="true"
  |                mcast_recv_buf_size="2000000" mcast_send_buf_size="640000"
  |                ucast_recv_buf_size="2000000" ucast_send_buf_size="640000"
  |                loopback="false" bind_addr="150.105.49.79"/>
  |             <PING timeout="2000" num_initial_members="3"
  |                up_thread="true" down_thread="true"/>
  |             <MERGE2 min_interval="10000" max_interval="20000"/>
  |             <FD_SOCK down_thread="false" up_thread="false"/>
  |             <FD shun="true" up_thread="true" down_thread="true"
  |                timeout="10000" max_tries="5"/>
  |             <VERIFY_SUSPECT timeout="3000" num_msgs="3"
  |                up_thread="true" down_thread="true"/>
  |             <pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"
  |                max_xmit_size="8192"
  |                up_thread="true" down_thread="true"/>
  |             <UNICAST timeout="300,600,1200,2400,4800" window_size="100" min_threshold="10"
  |                down_thread="true"/>
  |             <pbcast.STABLE desired_avg_gossip="20000" max_bytes="400000"
  |                up_thread="true" down_thread="true"/>
  |             <FRAG frag_size="8192"
  |                down_thread="true" up_thread="true"/>
  |             <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
  |                shun="true" print_local_addr="true"/>
  |             <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
  |          </Config>
  |       </attribute>
  |       <depends>jboss:service=Naming</depends>
  |    </mbean>
  | 

the following post show the DEBUG stack trace from Main node and 2nd node when the 2nd node is being started up

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4178942#4178942

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4178942