[jboss-user] [Clustering/JBoss] - Re: Order of node start up works one way, but not the other

PhilipWu do-not-reply at jboss.com
Tue Mar 6 14:26:26 EST 2007


Another experiment:

I tried updating Jboss lib to use the latest stable JGroups 2.4.0

This time Node B shows a lightly different trace. 

anonymous wrote : 2007-03-06 14:20:09,055 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Caught exception after channel connected; closing channel -- In
  | itial state transfer failed: Channel.getState() returned false
  | 

Looks like there was an exception in Node B, so node B sent a LEAVE request to Node A. Here's a more detailed trace of Node B:

anonymous wrote : 2007-03-06 14:19:38,990 DEBUG [org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [192.168.1.100:32772, 192.168.1.105:32825]
  | 2007-03-06 14:19:38,991 DEBUG [org.jgroups.protocols.FD] suspected_mbrs: [], after adjustment: []
  | 2007-03-06 14:19:38,993 DEBUG [org.jgroups.protocols.FD_SOCK] determinePingDest()=192.168.1.100:32772, pingable_mbrs=[192.168.1.100:32772, 192
  | .168.1.105:32825]
  | 2007-03-06 14:19:38,994 DEBUG [org.jgroups.protocols.FD_SOCK] ping_dest=192.168.1.100:32772, ping_sock=Socket[addr=/192.168.1.100,port=54423,l
  | ocalport=47344], cache={192.168.1.100:32772=192.168.1.100:54423}
  | 2007-03-06 14:19:39,047 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] ViewAccepted: initial members set
  | 2007-03-06 14:19:39,047 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Starting channel
  | 2007-03-06 14:19:39,048 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] get nodeName
  | 2007-03-06 14:19:39,048 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Get current members
  | 2007-03-06 14:19:39,048 INFO  [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Number of cluster members: 2
  | 2007-03-06 14:19:39,049 INFO  [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Other members: 1
  | 2007-03-06 14:19:39,049 INFO  [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Fetching state (will wait for 30000 milliseconds):
  | 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER] GET_STATE: asking 192.168.1.100:32772 for state
  | 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER] passing down a SUSPEND_STABLE event
  | 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STABLE] suspending message garbage collection
  | 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STABLE] resume task started, max_suspend_time=33000
  | 2007-03-06 14:19:43,168 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.pbcast.STABLE$StableTask at d7c6bf
  | 2007-03-06 14:19:43,208 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 392814
  | 2007-03-06 14:19:46,920 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.pbcast.STABLE$StableTask at 7673a2
  | 2007-03-06 14:19:49,000 DEBUG [org.jgroups.util.TimeScheduler] Running task true
  | 2007-03-06 14:19:49,000 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
  | 2007-03-06 14:19:49,001 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32772
  | 2007-03-06 14:19:55,487 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at a193fe
  | 2007-03-06 14:19:55,559 DEBUG [org.jgroups.util.TimeScheduler] Running task true
  | 2007-03-06 14:19:55,559 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32770 (own address=192.168.1.105:32823)
  | 2007-03-06 14:19:55,599 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 1fb2ef9
  | 2007-03-06 14:19:55,633 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32770
  | 2007-03-06 14:19:59,007 DEBUG [org.jgroups.util.TimeScheduler] Running task true
  | 2007-03-06 14:19:59,007 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
  | 2007-03-06 14:19:59,008 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32772
  | 2007-03-06 14:20:09,015 DEBUG [org.jgroups.util.TimeScheduler] Running task true
  | 2007-03-06 14:20:09,015 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
  | 2007-03-06 14:20:09,016 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32772
  | 2007-03-06 14:20:09,055 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Caught exception after channel connected; closing channel -- In
  | itial state transfer failed: Channel.getState() returned false
  | 2007-03-06 14:20:09,056 DEBUG [org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
  | 2007-03-06 14:20:09,056 DEBUG [org.jgroups.protocols.pbcast.GMS] sending LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
  | 2007-03-06 14:20:09,113 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[192.168.1.100:32772|2] [192.168.1.100:32772]
  | 2007-03-06 14:20:09,113 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[192.168.1.100:32772|2] [192.168.1.100:32772]
  | 2007-03-06 14:20:13,538 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.pbcast.STABLE$StabilitySendTask at 8d41f2
  | 2007-03-06 14:20:13,578 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 1e492d8
  | 2007-03-06 14:20:14,062 DEBUG [org.jgroups.protocols.pbcast.GMS] sending LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
  | 2007-03-06 14:20:14,371 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
  | 2007-03-06 14:20:14,978 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
  | 2007-03-06 14:20:15,494 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 164de59
  | 2007-03-06 14:20:15,566 DEBUG [org.jgroups.util.TimeScheduler] Running task true
  | 2007-03-06 14:20:15,566 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32770 (own address=192.168.1.105:32823)
  | 2007-03-06 14:20:15,602 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 141b736
  | 2007-03-06 14:20:15,635 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32770
  | 2007-03-06 14:20:16,186 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
  | 2007-03-06 14:20:18,594 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
  | 2007-03-06 14:20:19,022 DEBUG [org.jgroups.util.TimeScheduler] Running task true
  | 2007-03-06 14:20:19,023 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
  | 2007-03-06 14:20:19,024 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32772
  | 2007-03-06 14:20:19,070 DEBUG [org.jgroups.protocols.pbcast.GMS] sending LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
  | 2007-03-06 14:20:19,378 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
  | 2007-03-06 14:20:19,986 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
  | 2007-03-06 14:20:21,194 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
  | 2007-03-06 14:20:23,402 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
  | 2007-03-06 14:20:23,602 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
  | 2007-03-06 14:20:24,074 DEBUG [org.jgroups.protocols.pbcast.GMS] 192.168.1.105:32825 changed role to org.jgroups.protocols.pbcast.ClientGmsImp
  | l
  | 2007-03-06 14:20:24,078 DEBUG [org.jgroups.protocols.FD_SOCK] socket to 192.168.1.100:32772 was reset
  | 2007-03-06 14:20:24,078 DEBUG [org.jgroups.protocols.FD_SOCK] pinger thread terminated
  | 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] closing sockets and stopping threads
  | 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast receive socket closed
  | 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast send socket closed
  | 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast thread terminated
  | 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] socket closed
  | 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] unicast receiver socket is closed, exception=java.net.SocketException: Socket closed
  | 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] unicast receiver thread terminated
  | 2007-03-06 14:20:24,082 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Starting failed jboss:service=wubrothers
  | java.lang.IllegalStateException: Initial state transfer failed: Channel.getState() returned false
  |         at org.jboss.ha.framework.server.HAPartitionImpl.fetchState(HAPartitionImpl.java:351)
  | 


Node A still shows no exceptions. Here's the trace for node A that received the LEAVE request:

anonymous wrote : 2007-03-06 14:04:45,864 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.105:32825
  | 2007-03-06 14:04:45,920 DEBUG [org.jgroups.protocols.pbcast.GMS] received LEAVE_REQ for 192.168.1.105:32825 from 192.168.1.105:32825
  | 2007-03-06 14:04:45,973 DEBUG [org.jgroups.protocols.pbcast.GMS] new=[], suspected=[], leaving=[192.168.1.105:32825], new view: [192.168.1.100
  | :32772|2] [192.168.1.100:32772]
  | 2007-03-06 14:04:45,975 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[192.168.1.100:32772|2] [192.168.1.100:32772]
  | 2007-03-06 14:04:45,975 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=192.168.1.100:32772] view is [192.168.1.100:32772|2] [192.168.1.1
  | 00:32772]
  | 

Any other ideas?

Phil

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4025545#4025545

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4025545



More information about the jboss-user mailing list