[jboss-user] [Clustering/JBoss] - Re: Order of node start up works one way, but not the other
PhilipWu
do-not-reply at jboss.com
Tue Mar 6 14:26:26 EST 2007
Another experiment:
I tried updating Jboss lib to use the latest stable JGroups 2.4.0
This time Node B shows a lightly different trace.
anonymous wrote : 2007-03-06 14:20:09,055 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Caught exception after channel connected; closing channel -- In
| itial state transfer failed: Channel.getState() returned false
|
Looks like there was an exception in Node B, so node B sent a LEAVE request to Node A. Here's a more detailed trace of Node B:
anonymous wrote : 2007-03-06 14:19:38,990 DEBUG [org.jgroups.protocols.FD_SOCK] VIEW_CHANGE received: [192.168.1.100:32772, 192.168.1.105:32825]
| 2007-03-06 14:19:38,991 DEBUG [org.jgroups.protocols.FD] suspected_mbrs: [], after adjustment: []
| 2007-03-06 14:19:38,993 DEBUG [org.jgroups.protocols.FD_SOCK] determinePingDest()=192.168.1.100:32772, pingable_mbrs=[192.168.1.100:32772, 192
| .168.1.105:32825]
| 2007-03-06 14:19:38,994 DEBUG [org.jgroups.protocols.FD_SOCK] ping_dest=192.168.1.100:32772, ping_sock=Socket[addr=/192.168.1.100,port=54423,l
| ocalport=47344], cache={192.168.1.100:32772=192.168.1.100:54423}
| 2007-03-06 14:19:39,047 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] ViewAccepted: initial members set
| 2007-03-06 14:19:39,047 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Starting channel
| 2007-03-06 14:19:39,048 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] get nodeName
| 2007-03-06 14:19:39,048 DEBUG [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Get current members
| 2007-03-06 14:19:39,048 INFO [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Number of cluster members: 2
| 2007-03-06 14:19:39,049 INFO [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Other members: 1
| 2007-03-06 14:19:39,049 INFO [org.jboss.ha.framework.interfaces.HAPartition.wubrothers] Fetching state (will wait for 30000 milliseconds):
| 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER] GET_STATE: asking 192.168.1.100:32772 for state
| 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STATE_TRANSFER] passing down a SUSPEND_STABLE event
| 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STABLE] suspending message garbage collection
| 2007-03-06 14:19:39,050 DEBUG [org.jgroups.protocols.pbcast.STABLE] resume task started, max_suspend_time=33000
| 2007-03-06 14:19:43,168 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.pbcast.STABLE$StableTask at d7c6bf
| 2007-03-06 14:19:43,208 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 392814
| 2007-03-06 14:19:46,920 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.pbcast.STABLE$StableTask at 7673a2
| 2007-03-06 14:19:49,000 DEBUG [org.jgroups.util.TimeScheduler] Running task true
| 2007-03-06 14:19:49,000 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
| 2007-03-06 14:19:49,001 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32772
| 2007-03-06 14:19:55,487 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at a193fe
| 2007-03-06 14:19:55,559 DEBUG [org.jgroups.util.TimeScheduler] Running task true
| 2007-03-06 14:19:55,559 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32770 (own address=192.168.1.105:32823)
| 2007-03-06 14:19:55,599 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 1fb2ef9
| 2007-03-06 14:19:55,633 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32770
| 2007-03-06 14:19:59,007 DEBUG [org.jgroups.util.TimeScheduler] Running task true
| 2007-03-06 14:19:59,007 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
| 2007-03-06 14:19:59,008 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32772
| 2007-03-06 14:20:09,015 DEBUG [org.jgroups.util.TimeScheduler] Running task true
| 2007-03-06 14:20:09,015 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
| 2007-03-06 14:20:09,016 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32772
| 2007-03-06 14:20:09,055 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Caught exception after channel connected; closing channel -- In
| itial state transfer failed: Channel.getState() returned false
| 2007-03-06 14:20:09,056 DEBUG [org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
| 2007-03-06 14:20:09,056 DEBUG [org.jgroups.protocols.pbcast.GMS] sending LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
| 2007-03-06 14:20:09,113 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[192.168.1.100:32772|2] [192.168.1.100:32772]
| 2007-03-06 14:20:09,113 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[192.168.1.100:32772|2] [192.168.1.100:32772]
| 2007-03-06 14:20:13,538 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.pbcast.STABLE$StabilitySendTask at 8d41f2
| 2007-03-06 14:20:13,578 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 1e492d8
| 2007-03-06 14:20:14,062 DEBUG [org.jgroups.protocols.pbcast.GMS] sending LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
| 2007-03-06 14:20:14,371 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
| 2007-03-06 14:20:14,978 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
| 2007-03-06 14:20:15,494 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 164de59
| 2007-03-06 14:20:15,566 DEBUG [org.jgroups.util.TimeScheduler] Running task true
| 2007-03-06 14:20:15,566 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32770 (own address=192.168.1.105:32823)
| 2007-03-06 14:20:15,602 DEBUG [org.jgroups.util.TimeScheduler] Running task org.jgroups.protocols.TP$Bundler$BundlingTimer at 141b736
| 2007-03-06 14:20:15,635 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32770
| 2007-03-06 14:20:16,186 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
| 2007-03-06 14:20:18,594 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
| 2007-03-06 14:20:19,022 DEBUG [org.jgroups.util.TimeScheduler] Running task true
| 2007-03-06 14:20:19,023 DEBUG [org.jgroups.protocols.FD] sending are-you-alive msg to 192.168.1.100:32772 (own address=192.168.1.105:32825)
| 2007-03-06 14:20:19,024 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.100:32772
| 2007-03-06 14:20:19,070 DEBUG [org.jgroups.protocols.pbcast.GMS] sending LEAVE request to 192.168.1.100:32772 (local_addr=192.168.1.105:32825)
| 2007-03-06 14:20:19,378 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
| 2007-03-06 14:20:19,986 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
| 2007-03-06 14:20:21,194 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
| 2007-03-06 14:20:23,402 DEBUG [org.jgroups.util.TimeScheduler] Running task 6-6
| 2007-03-06 14:20:23,602 DEBUG [org.jgroups.util.TimeScheduler] Running task 7-7
| 2007-03-06 14:20:24,074 DEBUG [org.jgroups.protocols.pbcast.GMS] 192.168.1.105:32825 changed role to org.jgroups.protocols.pbcast.ClientGmsImp
| l
| 2007-03-06 14:20:24,078 DEBUG [org.jgroups.protocols.FD_SOCK] socket to 192.168.1.100:32772 was reset
| 2007-03-06 14:20:24,078 DEBUG [org.jgroups.protocols.FD_SOCK] pinger thread terminated
| 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] closing sockets and stopping threads
| 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast receive socket closed
| 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast send socket closed
| 2007-03-06 14:20:24,079 DEBUG [org.jgroups.protocols.UDP] multicast thread terminated
| 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] socket closed
| 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] unicast receiver socket is closed, exception=java.net.SocketException: Socket closed
| 2007-03-06 14:20:24,080 DEBUG [org.jgroups.protocols.UDP] unicast receiver thread terminated
| 2007-03-06 14:20:24,082 DEBUG [org.jboss.ha.framework.server.ClusterPartition] Starting failed jboss:service=wubrothers
| java.lang.IllegalStateException: Initial state transfer failed: Channel.getState() returned false
| at org.jboss.ha.framework.server.HAPartitionImpl.fetchState(HAPartitionImpl.java:351)
|
Node A still shows no exceptions. Here's the trace for node A that received the LEAVE request:
anonymous wrote : 2007-03-06 14:04:45,864 DEBUG [org.jgroups.protocols.FD] received ack from 192.168.1.105:32825
| 2007-03-06 14:04:45,920 DEBUG [org.jgroups.protocols.pbcast.GMS] received LEAVE_REQ for 192.168.1.105:32825 from 192.168.1.105:32825
| 2007-03-06 14:04:45,973 DEBUG [org.jgroups.protocols.pbcast.GMS] new=[], suspected=[], leaving=[192.168.1.105:32825], new view: [192.168.1.100
| :32772|2] [192.168.1.100:32772]
| 2007-03-06 14:04:45,975 DEBUG [org.jgroups.protocols.pbcast.GMS] view=[192.168.1.100:32772|2] [192.168.1.100:32772]
| 2007-03-06 14:04:45,975 DEBUG [org.jgroups.protocols.pbcast.GMS] [local_addr=192.168.1.100:32772] view is [192.168.1.100:32772|2] [192.168.1.1
| 00:32772]
|
Any other ideas?
Phil
View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4025545#4025545
Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4025545
More information about the jboss-user
mailing list