[jboss-user] [Clustering/JBoss] - Re: Problems with JBoss clustering

agohar do-not-reply at jboss.com
Thu Aug 28 12:09:46 EDT 2008


Hi,

Just to figure out the problem, i've tried to put fresh copies of jboss-4.2.2 on 3 test servers on the same network with same cluster configurations but find the same issue that when third one joins the cluster it is very slow, but i've got some WARN messages in logs of the other two servers. Here is what i tried:

- Started Server A (bind_addr = 10.100.54.14)
- Started Server B (bind_addr = 10.100.54.135).. Joins the cluster and Everything looks fine
- Started Server C(bind_addr = 10.100.54.12) .. It does join the cluster but is very slow

Here are the logs on C, it stucks for long time here (posting relevant portion only):


  | -------------------------------------------------------
  | GMS: address is 10.100.54.12:34566
  | -------------------------------------------------------
  | 16:22:35,096 WARN  [GMS] join(10.100.54.12:34566) sent to 10.100.54.14:40469 timed out, retrying
  | 16:22:39,129 INFO  [TreeCache] viewAccepted(): [10.100.54.14:40469|2] [10.100.54.14:40469, 10.100.54.135:45846, 10.100.54.12:34566]
  | 16:22:42,160 ERROR [FD_SOCK] received null cache; retrying
  | 16:22:45,668 ERROR [FD_SOCK] received null cache; retrying
  | 16:22:49,176 ERROR [FD_SOCK] received null cache; retrying
  | 16:22:49,686 INFO  [TreeCache] TreeCache local address is 10.100.54.12:34566
  | 16:22:49,699 INFO  [TreeCache] received the state (size=1024 bytes)
  | 16:22:49,740 INFO  [TreeCache] state was retrieved successfully (in 54 milliseconds)
  | 16:22:49,740 INFO  [TreeCache] parseConfig(): PojoCacheConfig is empty
  | 16:22:49,949 INFO  [STDOUT] no object for null
  | 16:22:49,958 INFO  [STDOUT] no object for null
  | 16:22:50,011 INFO  [STDOUT] no object for null
  | 16:22:50,053 INFO  [STDOUT] no object for {urn:jboss:bean-deployer}supplyType
  | 16:22:50,075 INFO  [STDOUT] no object for {urn:jboss:bean-deployer}dependsType
  | 16:22:53,624 INFO  [NativeServerConfig] JBoss Web Services - Native
  | 16:22:53,624 INFO  [NativeServerConfig] jbossws-native-2.0.1.SP2 (build=200710210837)
  | 16:22:54,978 INFO  [SnmpAgentService] SNMP agent going active
  | 16:22:55,627 INFO  [DefaultPartition] Initializing
  | 16:22:55,714 INFO  [STDOUT]
  | -------------------------------------------------------
  | GMS: address is 10.100.54.12:34571
  | -------------------------------------------------------
  | 16:23:02,800 ERROR [FD_SOCK] received null cache; retrying
  | 16:23:06,308 ERROR [FD_SOCK] received null cache; retrying
  | 16:23:09,816 ERROR [FD_SOCK] received null cache; retrying
  | 16:23:10,323 INFO  [DefaultPartition] Number of cluster members: 3
  | 16:23:10,323 INFO  [DefaultPartition] Other members: 2
  | 16:23:10,323 INFO  [DefaultPartition] Fetching state (will wait for 30000 milliseconds):
  | 16:23:10,374 INFO  [DefaultPartition] state was retrieved successfully (in 50 milliseconds)
  | 16:24:10,483 INFO  [HANamingService] Started ha-jndi bootstrap jnpPort=1100, backlog=50, bindAddress=/0.0.0.0
  | 16:24:10,497 INFO  [DetachedHANamingService$AutomaticDiscovery] Listening on /0.0.0.0:1102, group=230.0.0.4, HA-JNDI address=10.100.54.12:1100
  | 

I can see these warnings on Server A's Logs:

  | 16:22:32,150 INFO  [TreeCache] viewAccepted(): [10.100.54.14:40469|2] [10.100.54.14:40469, 10.100.54.135:45846, 10.100.54.12:34566]
  | 16:22:37,157 WARN  [GMS] failed to collect all ACKs (2) for view [10.100.54.14:40469|2] [10.100.54.14:40469, 10.100.54.135:45846, 10.100.54.12:34566] after 5000ms, missing ACKs from [10.100.54.135:45846] (received=[10.100.54.14:40469]), local_addr=10.100.54.14:40469
  | 16:22:39,106 WARN  [GMS] 10.100.54.12:34566 already present; returning existing view [10.100.54.14:40469|2] [10.100.54.14:40469, 10.100.54.135:45846, 10.100.54.12:34566]
  | 16:22:49,694 INFO  [TreeCache] locking the subtree at / to transfer state
  | 16:22:49,694 INFO  [StateTransferGenerator_140] returning the state for tree rooted in /(1024 bytes)
  | 16:22:57,784 INFO  [DefaultPartition] New cluster view for partition DefaultPartition (id: 2, delta: 1) : [10.100.54.14:1099, 10.100.54.135:1099, 10.100.54.12:1099]
  | 16:22:57,784 INFO  [DefaultPartition] I am (10.100.54.14:1099) received membershipChanged event:
  | 16:22:57,784 INFO  [DefaultPartition] Dead members: 0 ([])
  | 16:22:57,784 INFO  [DefaultPartition] New Members : 1 ([10.100.54.12:1099])
  | 16:22:57,784 INFO  [DefaultPartition] All Members : 3 ([10.100.54.14:1099, 10.100.54.135:1099, 10.100.54.12:1099])
  | 16:22:59,790 WARN  [GMS] failed to collect all ACKs (2) for view [10.100.54.14:40472|2] [10.100.54.14:40472, 10.100.54.135:45849, 10.100.54.12:34571] after 2000ms, missing ACKs from [10.100.54.135:45849] (received=[10.100.54.14:40472]), local_addr=10.100.54.14:40472
  | 16:26:13,214 INFO  [TreeCache] viewAccepted(): [10.100.54.14:40474|1] [10.100.54.14:40474, 10.100.54.12:34573]
  | 16:26:26,091 INFO  [TreeCache] viewAccepted(): [10.100.54.14:40476|1] [10.100.54.14:40476, 10.100.54.12:34575]
  | 

and warnings on Server B:

  | 16:22:40,007 WARN  [NAKACK] 10.100.54.135:45846] discarded message from non-member 10.100.54.12:34566, my view is [10.100.54.14:40469|1] [10.100.54.14:40469, 10.100.54.135:45846]
  | 16:23:05,714 WARN  [NAKACK] 10.100.54.135:45849] discarded message from non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] [10.100.54.14:40472, 10.100.54.135:45849]
  | 16:23:10,452 WARN  [NAKACK] 10.100.54.135:45849] discarded message from non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] [10.100.54.14:40472, 10.100.54.135:45849]
  | 16:24:10,502 WARN  [NAKACK] 10.100.54.135:45849] discarded message from non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] [10.100.54.14:40472, 10.100.54.135:45849]
  | 16:24:45,147 WARN  [NAKACK] 10.100.54.135:45846] discarded message from non-member 10.100.54.12:34566, my view is [10.100.54.14:40469|1] [10.100.54.14:40469, 10.100.54.135:45846]
  | 16:25:09,831 WARN  [NAKACK] 10.100.54.135:45849] discarded message from non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] [10.100.54.14:40472, 10.100.54.135:45849]
  | 16:25:10,504 WARN  [NAKACK] 10.100.54.135:45849] discarded message from non-member 10.100.54.12:34571, my view is [10.100.54.14:40472|1] [10.100.54.14:40472, 10.100.54.135:45849]
  | 

Please note that another cluster is already running on the same network with 5 servers in it and it works fine. and i am looking to run both of these clusters in parallel.

Any clue?

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4173128#4173128

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4173128



More information about the jboss-user mailing list