[jboss-user] [Clustering/JBoss] - handleJoin(192.168.1.9:32902) failed, retrying

snedapunk do-not-reply at jboss.com
Tue Jul 3 04:16:49 EDT 2007


Hello,
Every so often we start getting this error ?handleJoin(192.168.1.9:32
915) failed, retrying
?  after a restart of a machine in the production cluster.  While in this state, no connections can be made and sometimes a global restart is needed.  (Jboss4.0.3SP1)

Using 4.0.5GA I was able to reproduce this in a virtual environment by starting virtual machine 1 then starting virtual machine 2.   Then stopping machine 1 the coordinator and restarting virtual 1.  However in4.0.5GA it seems to only do this for approx 5 minutes and then it finally reconnects.  In the production environment, it appears as if it never reconnects even after hours of being in this state.  See last code for state of production.

Any tips on how to prevent this would be greatly apprectiated.  I supplied some org.jgroups debug output while in this state. 

I think I might have read a bug about this such that getUniqueName is not unique when restarting since uses JNDI port??

2007-07-03 05:03:49,733 DEBUG [org.jgroups.protocols.UDP] creating sockets and starting threads
  | 2007-07-03 05:03:49,733 INFO  [org.jgroups.protocols.UDP] unicast sockets will use interface 192.168.1.9
  | 2007-07-03 05:03:49,737 INFO  [org.jgroups.protocols.UDP] socket information:
  | local_addr=192.168.1.9:32915, mcast_addr=230.1.2.7:45577, bind_addr=/192.168.1.9, ttl=2
  | sock: bound to 192.168.1.9:32915, receive buffer size=131071, send buffer size=131071
  | mcast_recv_sock: bound to 192.168.1.9:45577, send buffer size=131071, receive buffer size=131071
  | mcast_send_sock: bound to 192.168.1.9:32916, send buffer size=131071, receive buffer size=131071
  | 2007-07-03 05:03:49,830 INFO  [STDOUT] 
  | -------------------------------------------------------
  | GMS: address is 192.168.1.9:32915
  | -------------------------------------------------------
  | 2007-07-03 05:03:49,833 DEBUG [org.jgroups.protocols.UDP] created unicast receiver thread
  | 2007-07-03 05:03:49,906 DEBUG [org.jgroups.protocols.FD_SOCK] determinePingDest()=null, pingable_mbrs=[]
  | 2007-07-03 05:03:49,907 DEBUG [org.jgroups.protocols.FD_SOCK] pinger thread terminated
  | 2007-07-03 05:03:49,937 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 230.1.2.7:45577: 1 msgs
  | 
  | 2007-07-03 05:03:51,929 DEBUG [org.jgroups.protocols.PING] initial mbrs are [[own_addr=192.168.1.2:1155, coord_addr=192.168.1.9:32908]]
  | 2007-07-03 05:03:51,929 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] initial_mbrs are [[own_addr=192.168.1.2:1155, coord_addr=192.168.1.9:32908]]
  | 2007-07-03 05:03:51,929 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] sending handleJoin(192.168.1.9:32915) to 192.168.1.9:32908
  | 2007-07-03 05:03:51,933 DEBUG [org.jgroups.protocols.pbcast.STABLE] stable task started; num_gossip_runs=3, max_gossip_runs=3
  | 2007-07-03 05:03:51,983 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:03:52,293 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:03:52,897 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:03:54,101 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:03:54,953 WARN  [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(192.168.1.9:32915) failed, retrying
  | 2007-07-03 05:03:56,510 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:03:56,993 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 230.1.2.7:45577: 1 msgs
  | 
  | 2007-07-03 05:03:58,966 DEBUG [org.jgroups.protocols.PING] initial mbrs are [[own_addr=192.168.1.2:1155, coord_addr=192.168.1.9:32908]]
  | 2007-07-03 05:03:58,966 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] initial_mbrs are [[own_addr=192.168.1.2:1155, coord_addr=192.168.1.9:32908]]
  | 2007-07-03 05:03:58,966 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] sending handleJoin(192.168.1.9:32915) to 192.168.1.9:32908
  | 
  | *********************************************************************A bunuch of these and then finally we get .
  | 
  | 2007-07-03 05:05:44,188 DEBUG [org.jgroups.protocols.PING] initial mbrs are [[own_addr=192.168.1.2:1155, coord_addr=192.168.1.9:32908]]
  | 2007-07-03 05:05:44,188 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] initial_mbrs are [[own_addr=192.168.1.2:1155, coord_addr=192.168.1.9:32908]]
  | 2007-07-03 05:05:44,188 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] sending handleJoin(192.168.1.9:32915) to 192.168.1.9:32908
  | 2007-07-03 05:05:44,224 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:44,260 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:44,436 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:44,528 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:44,648 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:45,133 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:45,336 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:45,540 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:45,740 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:45,920 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:46,120 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:46,308 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 2 msgs
  | 
  | 2007-07-03 05:05:46,504 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:46,696 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:46,893 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:47,088 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:47,192 WARN  [org.jgroups.protocols.pbcast.ClientGmsImpl] handleJoin(192.168.1.9:32915) failed, retrying
  | 2007-07-03 05:05:47,280 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:47,476 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:47,582 DEBUG [org.jgroups.protocols.FD] [SUSPECT] suspect hdr is [FD: SUSPECT (suspected_mbrs=[192.168.1.9:32908], from=192.168.1.2:1155)]
  | 2007-07-03 05:05:47,616 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:47,668 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:47,864 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:48,040 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:48,252 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:48,740 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:48,940 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:49,086 ERROR [org.jgroups.protocols.pbcast.ClientGmsImpl] suspect() should not be invoked on an instance of org.jgroups.protocols.pbcast.ClientGmsImpl
  | 2007-07-03 05:05:49,087 DEBUG [org.jgroups.blocks.RequestCorrelator] suspect=192.168.1.9:32908
  | 2007-07-03 05:05:49,136 DEBUG [org.jgroups.protocols.pbcast.NAKACK] message was discarded (not yet server)
  | 2007-07-03 05:05:49,144 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:49,232 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 230.1.2.7:45577: 1 msgs
  | 
  | 2007-07-03 05:05:49,344 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:49,524 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:49,724 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:49,912 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:50,109 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:50,300 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:50,496 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:50,692 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:50,884 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:51,080 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.9:32908: 1 msgs
  | 
  | 2007-07-03 05:05:51,200 DEBUG [org.jgroups.protocols.PING] initial mbrs are [[own_addr=192.168.1.2:1155, coord_addr=192.168.1.2:1155]]
  | 2007-07-03 05:05:51,200 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] initial_mbrs are [[own_addr=192.168.1.2:1155, coord_addr=192.168.1.2:1155]]
  | 2007-07-03 05:05:51,200 DEBUG [org.jgroups.protocols.pbcast.ClientGmsImpl] sending handleJoin(192.168.1.9:32915) to 192.168.1.2:1155
  | 2007-07-03 05:05:51,236 DEBUG [org.jgroups.protocols.UDP] 
  | sending msgs:
  | 192.168.1.2:1155: 1 msgs
  | 
  | 2007-07-03 05:05:51,272 DEBUG [org.jgroups.protocols.UDP] 
  | 
Here is the output of the other machine.  Looks like the unique name is not unique.

  | 01:29:19,705 INFO  [Server] JBoss (MX MicroKernel) [4.0.5.GA (build: CVSTag=Branch_4_0 date=200702201759)] Started in 52s:15ms
  | 01:33:08,987 INFO  [UI] New cluster view for partition UI (id: 2, delta: -1) : [127.0.0.2:1099]
  | 01:33:09,063 INFO  [UI] I am (127.0.0.2:1099) received membershipChanged event:
  | 01:33:09,075 INFO  [UI] Dead members: 0 ([])
  | 01:33:09,076 INFO  [UI] New Members : 0 ([])
  | 01:33:09,076 INFO  [UI] All Members : 1 ([127.0.0.2:1099])
  | 01:33:11,800 INFO  [TomcatDeployer] deploy, ctxPath=/jbossmq-httpil, warUrl=.../deploy-hasingleton/jms/jbossmq-httpil.sar/jbossmq-httpil.war/
  | 01:33:12,814 INFO  [A] Bound to JNDI name: queue/A
  | 01:33:12,825 INFO  [B] Bound to JNDI name: queue/B
  | 01:33:12,834 INFO  [C] Bound to JNDI name: queue/C
  | 01:33:12,843 INFO  [D] Bound to JNDI name: queue/D
  | 01:33:12,850 INFO  [ex] Bound to JNDI name: queue/ex
  | 01:33:12,891 INFO  [testTopic] Bound to JNDI name: topic/testTopic
  | 01:33:12,907 INFO  [securedTopic] Bound to JNDI name: topic/securedTopic
  | 01:33:12,917 INFO  [testDurableTopic] Bound to JNDI name: topic/testDurableTopic
  | 01:33:12,936 INFO  [testQueue] Bound to JNDI name: queue/testQueue
  | 01:33:13,069 INFO  [UILServerILService] JBossMQ UIL service available at : /0.0.0.0:8093
  | 01:33:13,210 INFO  [DLQ] Bound to JNDI name: queue/DLQ
  | 01:35:35,247 INFO  [TreeCache] viewAccepted(): [192.168.1.2:1155|2] [192.168.1.2:1155]
  | 01:35:37,367 INFO  [TreeCache] viewAccepted(): [192.168.1.2:1155|3] [192.168.1.2:1155, 192.168.1.9:32915]
  | 01:35:37,503 INFO  [TreeCache] locking the subtree at / to transfer state
  | 01:35:37,517 INFO  [StateTransferGenerator_140] returning the state for tree rooted in /(1024 bytes)
  | 01:35:40,765 INFO  [UI] New cluster view for partition UI (id: 3, delta: 1) : [127.0.0.2:1099, 127.0.0.2:1099]
  | 01:35:40,767 INFO  [UI] I am (127.0.0.2:1099) received membershipChanged event:
  | 01:35:40,768 INFO  [UI] Dead members: 0 ([])
  | 01:35:40,768 INFO  [UI] New Members : 0 ([])
  | 01:35:40,769 INFO  [UI] All Members : 2 ([127.0.0.2:1099, 127.0.0.2:1099]) 
On this production one, notice TreeCache shows 5 members but HAPartition.UI is only showing 4.  The missing one vrx15 is now in this state such that it cannot accept incoming connections, and cannot join the cluster.  It is only able to show handleJoin(something here) failed, retrying

  |  [org.jboss.cache.TreeCache] viewAccepted(): new members: [vrx13-in:39016, vrx12-in:42073, vrx14-in:58902, vrx15-in:41415, vrx10:57939] 
  | 2007-06-27 18:30:49,682 INFO   [org.jboss.ha.framework.interfaces.HAPartition.UI] New cluster view for partition UI: 159 ([192.168.120.112:1099, 192.168.120.143:1099, 192.168.120.144:1099, 192.168.120.110:1099] delta: -1)
  | 2007-06-27 18:30:49,683 INFO   [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.UI] I am (192.168.120.110:1099) received membershipChanged event:
  | 2007-06-27 18:30:49,683 INFO   [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.UI] Dead members: 1 ([192.168.120.111:1099])
  | 2007-06-27 18:30:49,683 INFO   [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.UI] New Members : 0 ([])
  | 2007-06-27 18:30:49,683 INFO   [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.UI] All Members : 4 ([ 192.168.120.112:1099, 192.168.120.143:1099, 192.168.120.144:1099, 192.168.120.110:1099])
  |  
  |  
  | 
  | 


View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4059810#4059810

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4059810



More information about the jboss-user mailing list