[jboss-user] [Clustering/JBoss] - Cluster node stops processing when other node goes down

rana24 do-not-reply at jboss.com
Tue Jan 6 07:00:32 EST 2009


Hi, 
We are running JBoss AS 4.2.3 with JBM1.4.0 SP3 with 2 cluster members forming the cluster. All the nodes are getting high no. of messages which starts application which again heavily uses messaging for business logic. 
We face this problem....when one cluster node goes down because of OutofMemory exception , the other cluster node also stops processing , it seems they try to establish some communication and it fails. 
Is this expected behaviour ? Ideally, the other node should keep processing right ?? 

I hope i am discussing with right group of people.

Please find following console log for 3 node cluster. 

Thank you in advance. 



  | 
  | //10.31.3.22/AC70Mod16.pdf
  | 15:28:14,071 WARN  [FD] I was suspected by 10.31.2.85:1516; ignoring the SUSPECT
  |  message and sending back a HEARTBEAT_ACK
  | 15:28:14,071 WARN  [FD] I was suspected by 10.31.2.85:1509; ignoring the SUSPECT
  |  message and sending back a HEARTBEAT_ACK
  | 15:28:14,071 WARN  [FD] I was suspected by 10.31.2.85:1521; ignoring the SUSPECT
  |  message and sending back a HEARTBEAT_ACK
  | 15:28:14,086 WARN  [FD] I was suspected by 10.31.2.85:1503; ignoring the SUSPECT
  |  message and sending back a HEARTBEAT_ACK
  | 15:28:18,681 WARN  [GMS] I (10.31.4.242:1755) am not a member of view [10.31.2.8
  | 5:1521|13] [10.31.2.85:1521, 10.31.2.11:3017], shunning myself and leaving the g
  | roup (prev_members are [10.31.2.85:1521 10.31.2.11:1253 10.31.4.242:1755 10.31.2
  | .11:3017 ], current view is [10.31.2.85:1521|12] [10.31.2.85:1521, 10.31.4.242:1
  | 755, 10.31.2.11:3017])
  | 15:28:18,681 WARN  [GMS] I (10.31.4.242:1752) am not a member of view [10.31.2.8
  | 5:1509|13] [10.31.2.85:1509, 10.31.2.11:3025], shunning myself and leaving the g
  | roup (prev_members are [10.31.2.85:1509 10.31.2.11:1254 10.31.4.242:1752 10.31.2
  | .11:3025 ], current view is [10.31.2.85:1509|12] [10.31.2.85:1509, 10.31.4.242:1
  | 752, 10.31.2.11:3025])
  | 15:28:18,681 WARN  [GMS] I (10.31.4.242:1754) am not a member of view [10.31.2.8
  | 5:1503|13] [10.31.2.85:1503, 10.31.2.11:3023], shunning myself and leaving the g
  | roup (prev_members are [10.31.2.85:1503 10.31.2.11:1252 10.31.4.242:1754 10.31.2
  | .11:3023 ], current view is [10.31.2.85:1503|12] [10.31.2.85:1503, 10.31.4.242:1
  | 754, 10.31.2.11:3023])
  | 15:28:18,696 WARN  [GMS] I (10.31.4.242:1753) am not a member of view [10.31.2.8
  | 5:1516|13] [10.31.2.85:1516, 10.31.2.11:3027], shunning myself and leaving the g
  | roup (prev_members are [10.31.2.85:1516 10.31.2.11:1255 10.31.4.242:1753 10.31.2
  | .11:3027 ], current view is [10.31.2.85:1516|12] [10.31.2.85:1516, 10.31.4.242:1
  | 753, 10.31.2.11:3027])
  | 15:28:19,493 INFO  [STDOUT]
  | -------------------------------------------------------
  | GMS: address is 10.31.4.242:4655
  | -------------------------------------------------------
  | 15:28:19,493 INFO  [STDOUT]
  | -------------------------------------------------------
  | GMS: address is 10.31.4.242:4654
  | -------------------------------------------------------
  | 15:28:19,509 INFO  [STDOUT]
  | -------------------------------------------------------
  | GMS: address is 10.31.4.242:4652
  | -------------------------------------------------------
  | 15:28:19,509 INFO  [STDOUT]
  | -------------------------------------------------------
  | GMS: address is 10.31.4.242:4653
  | -------------------------------------------------------
  | 15:28:21,525 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4654|0] [10.31.4.242
  | :4654]
  | 15:28:21,525 INFO  [EXA] New cluster view for partition EXA (id: 0, delta: -2) :
  |  [10.31.4.242:1099]
  | 15:28:21,525 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4653|0] [10.31.4.242
  | :4653]
  | 15:28:21,525 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4655|0] [10.31.4.242
  | :4655]
  | 15:28:21,525 INFO  [EXA] I am (10.31.4.242:1099) received membershipChanged even
  | t:
  | 15:28:21,525 INFO  [EXA] Dead members: 2 ([10.31.2.85:1099, 10.31.2.11:1099])
  | 15:28:21,525 INFO  [EXA] New Members : 0 ([])
  | 15:28:21,525 INFO  [EXA] All Members : 1 ([10.31.4.242:1099])
  | 15:28:44,060 INFO  [TreeCache] viewAccepted(): MergeView::[10.31.2.11:3017|14] [
  | 10.31.2.11:3017, 10.31.2.85:1521, 10.31.4.242:4653], subgroups=[[10.31.2.85:1521
  | |13] [10.31.2.85:1521, 10.31.2.11:3017], [10.31.4.242:4653|0] [10.31.4.242:4653]
  | ]
  | 15:28:46,701 INFO  [TreeCache] viewAccepted(): MergeView::[10.31.2.11:3027|14] [
  | 10.31.2.11:3027, 10.31.2.85:1516, 10.31.4.242:4655], subgroups=[[10.31.2.85:1516
  | |13] [10.31.2.85:1516, 10.31.2.11:3027], [10.31.4.242:4655|0] [10.31.4.242:4655]
  | ]
  | 15:28:49,436 WARN  [NAKACK] 10.31.4.242:4652] discarded message from non-member
  | 10.31.2.85:1509, my view is [10.31.4.242:4652|0] [10.31.4.242:4652]
  | 15:28:52,343 WARN  [NAKACK] 10.31.4.242:4654] discarded message from non-member
  | 10.31.2.85:1503, my view is [10.31.4.242:4654|0] [10.31.4.242:4654]
  | 15:28:52,343 INFO  [TreeCache] viewAccepted(): MergeView::[10.31.2.11:3023|14] [
  | 10.31.2.11:3023, 10.31.2.85:1503, 10.31.4.242:4654], subgroups=[[10.31.2.85:1503
  | |13] [10.31.2.85:1503, 10.31.2.11:3023], [10.31.4.242:4654|0] [10.31.4.242:4654]
  | ]
  | 15:28:59,078 INFO  [EXA] New cluster view for partition EXA: 14 ([10.31.2.11:109
  | 9, 10.31.2.85:1099, 10.31.4.242:1099] delta: 2)
  | 15:28:59,078 INFO  [EXA] Merging partitions...
  | 15:28:59,078 INFO  [EXA] Dead members: 0
  | 15:28:59,094 INFO  [EXA] Originating groups: [[10.31.2.85:1509|13] [10.31.2.85:1
  | 509, 10.31.2.11:3025], [10.31.4.242:4652|0] [10.31.4.242:4652]]
  | 15:29:57,932 INFO  [TreeCache] viewAccepted(): [10.31.2.85:1521|15] [10.31.2.85:
  | 1521, 10.31.4.242:4653]
  | 15:30:03,901 INFO  [TreeCache] viewAccepted(): [10.31.2.85:1516|15] [10.31.2.85:
  | 1516, 10.31.4.242:4655]
  | 15:30:10,606 INFO  [EXA] Suspected member: 10.31.2.11:3025
  | 15:30:13,028 INFO  [TreeCache] viewAccepted(): [10.31.2.85:1503|15] [10.31.2.85:
  | 1503, 10.31.4.242:4654]
  | 15:30:15,747 INFO  [EXA] New cluster view for partition EXA: 15 ([10.31.2.85:109
  | 9, 10.31.4.242:1099] delta: -1)
  | 15:30:15,747 INFO  [EXA] I am (10.31.4.242:1099) received membershipChanged even
  | t:
  | 15:30:15,747 INFO  [EXA] Dead members: 1 ([10.31.2.11:1099])
  | 15:30:15,747 INFO  [EXA] New Members : 0 ([])
  | 15:30:15,747 INFO  [EXA] All Members : 2 ([10.31.2.85:1099, 10.31.4.242:1099])
  | 15:32:38,349 WARN  [NAKACK] 10.31.4.242:4653] discarded message from non-member
  | 10.31.2.11:3017, my view is [10.31.2.85:1521|15] [10.31.2.85:1521, 10.31.4.242:4
  | 653]
  | 15:32:38,349 WARN  [NAKACK] 10.31.4.242:4652] discarded message from non-member
  | 10.31.2.11:3025, my view is [10.31.2.85:1509|15] [10.31.2.85:1509, 10.31.4.242:4
  | 652]
  | 15:32:38,396 WARN  [NAKACK] 10.31.4.242:4654] discarded message from non-member
  | 10.31.2.11:3023, my view is [10.31.2.85:1503|15] [10.31.2.85:1503, 10.31.4.242:4
  | 654]
  | 15:32:38,412 WARN  [NAKACK] 10.31.4.242:4655] discarded message from non-member
  | 10.31.2.11:3027, my view is [10.31.2.85:1516|15] [10.31.2.85:1516, 10.31.4.242:4
  | 655]
  | 15:32:39,834 INFO  [EXA] Suspected member: 10.31.2.85:1509
  | 15:32:39,974 INFO  [EXA] New cluster view for partition EXA (id: 16, delta: -1)
  | : [10.31.4.242:1099]
  | 15:32:39,990 INFO  [EXA] I am (10.31.4.242:1099) received membershipChanged even
  | t:
  | 15:32:39,990 INFO  [EXA] Dead members: 1 ([10.31.2.85:1099])
  | 15:32:39,990 INFO  [EXA] New Members : 0 ([])
  | 15:32:39,990 INFO  [EXA] All Members : 1 ([10.31.4.242:1099])
  | 15:32:39,990 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4653|16] [10.31.4.24
  | 2:4653]
  | 15:32:40,021 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4654|16] [10.31.4.24
  | 2:4654]
  | 15:32:40,037 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4655|16] [10.31.4.24
  | 2:4655]
  | 15:32:43,506 WARN  [NAKACK] 10.31.4.242:4655] discarded message from non-member
  | 10.31.2.85:1516, my view is [10.31.4.242:4655|16] [10.31.4.242:4655]
  | 15:32:52,633 INFO  [EXA] New cluster view for partition EXA (id: 17, delta: 1) :
  |  [10.31.4.242:1099, 10.31.2.11:1099]
  | 15:32:52,633 INFO  [EXA] I am (10.31.4.242:1099) received membershipChanged even
  | t:
  | 15:32:52,633 INFO  [EXA] Dead members: 0 ([])
  | 15:32:52,633 INFO  [EXA] New Members : 1 ([10.31.2.11:1099])
  | 15:32:52,633 INFO  [EXA] All Members : 2 ([10.31.4.242:1099, 10.31.2.11:1099])
  | 15:32:52,680 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4655|17] [10.31.4.24
  | 2:4655, 10.31.2.11:4851]
  | 15:32:52,680 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4654|17] [10.31.4.24
  | 2:4654, 10.31.2.11:4857]
  | 15:32:56,477 INFO  [TreeCache] locking the subtree at / to transfer state
  | 15:32:56,555 WARN  [NAKACK] 10.31.4.242:4653] discarded message from non-member
  | 10.31.2.11:4852, my view is [10.31.4.242:4653|16] [10.31.4.242:4653]
  | 15:32:58,477 INFO  [StateTransferGenerator_140] returning the state for tree roo
  | ted in /(1024 bytes)
  | 15:32:58,477 WARN  [NAKACK] 10.31.4.242:4653] discarded message from non-member
  | 10.31.2.11:4852, my view is [10.31.4.242:4653|16] [10.31.4.242:4653]
  | 15:33:07,604 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4654|18] [10.31.4.24
  | 2:4654, 10.31.2.11:4857, 10.31.2.85:13405]
  | 15:33:07,635 INFO  [TreeCache] viewAccepted(): [10.31.4.242:4655|18] [10.31.4.24
  | 2:4655, 10.31.2.11:4851, 10.31.2.85:13406]
  | 15:33:09,792 WARN  [NAKACK] 10.31.4.242:4653] discarded message from non-member
  | 10.31.2.11:4852, my view is [10.31.4.242:4653|16] [10.31.4.242:4653]
  | 15:33:12,574 WARN  [GMS] failed to collect all ACKs (2) for view [10.31.4.242:46
  | 54|18] [10.31.4.242:4654, 10.31.2.11:4857, 10.31.2.85:13405] after 5000ms, missi
  | ng ACKs from [10.31.4.242:4654, 10.31.2.11:4857] (received=[]), local_addr=10.31
  | .4.242:4654
  | 15:33:12,605 WARN  [GMS] failed to collect all ACKs (2) for view [10.31.4.242:46
  | 55|18] [10.31.4.242:4655, 10.31.2.11:4851, 10.31.2.85:13406] after 5000ms, missi
  | ng ACKs from [10.31.4.242:4655, 10.31.2.11:4851] (received=[]), local_addr=10.31
  | .4.242:4655
  | 15:33:23,372 WARN  [NAKACK] 10.31.4.242:4653] discarded message from non-member
  | 10.31.2.85:13407, my view is [10.31.4.242:4653|16] [10.31.4.242:4653]
  | 15:33:52,877 INFO  [TreeCache] viewAccepted(): MergeView::[10.31.2.11:4857|19] [
  | 10.31.2.11:4857, 10.31.2.85:13405, 10.31.4.242:4654], subgroups=[[10.31.2.85:134
  | 05|0] [10.31.2.85:13405], [10.31.4.242:4654|18] [10.31.4.242:4654, 10.31.2.11:48
  | 57, 10.31.2.85:13405]]
  | 15:33:55,706 INFO  [TreeCache] viewAccepted(): MergeView::[10.31.2.11:4851|19] [
  | 10.31.2.11:4851, 10.31.2.85:13406, 10.31.4.242:4655], subgroups=[[10.31.2.85:134
  | 06|0] [10.31.2.85:13406], [10.31.4.242:4655|18] [10.31.4.242:4655, 10.31.2.11:48
  | 51, 10.31.2.85:13406]]
  | 15:33:57,831 INFO  [TreeCache] viewAccepted(): MergeView::[10.31.2.11:4852|17] [
  | 10.31.2.11:4852, 10.31.2.85:13407, 10.31.4.242:4653], subgroups=[[10.31.2.11:485
  | 2|1] [10.31.2.11:4852, 10.31.2.85:13407], [10.31.4.242:4653|16] [10.31.4.242:465
  | 3]]
  | 15:33:57,847 WARN  [GMS] failed to collect all ACKs (3) for view MergeView::[10.
  | 31.2.11:4857|19] [10.31.2.11:4857, 10.31.2.85:13405, 10.31.4.242:4654], subgroup
  | s=[[10.31.2.85:13405|0] [10.31.2.85:13405], [10.31.4.242:4654|18] [10.31.4.242:4
  | 654, 10.31.2.11:4857, 10.31.2.85:13405]] after 5000ms, missing ACKs from [10.31.
  | 2.85:13405] (received=[10.31.4.242:4654, 10.31.2.11:4857]), local_addr=10.31.4.2
  | 42:4654
  | 15:34:00,675 WARN  [GMS] failed to collect all ACKs (3) for view MergeView::[10.
  | 31.2.11:4851|19] [10.31.2.11:4851, 10.31.2.85:13406, 10.31.4.242:4655], subgroup
  | s=[[10.31.2.85:13406|0] [10.31.2.85:13406], [10.31.4.242:4655|18] [10.31.4.242:4
  | 655, 10.31.2.11:4851, 10.31.2.85:13406]] after 5000ms, missing ACKs from [10.31.
  | 2.85:13406] (received=[10.31.4.242:4655, 10.31.2.11:4851]), local_addr=10.31.4.2
  | 42:4655
  |  
  | 

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4199737#4199737

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4199737



More information about the jboss-user mailing list