[
https://jira.jboss.org/jira/browse/JGRP-957?page=com.atlassian.jira.plugi...
]
a C updated JGRP-957:
---------------------
Description:
We are using jgroups as a notification system between webapps running inside tomcat or
weblogic server. In our current test platform all cluster nodes are on the same host, most
of them on the same container (tomcat). Some web-applictions may have several connections
to the cluster.
We use UDP multicast on a LAN, the configuration is nearly the default one.
The system seems to work fine but regularly we have cluster stability issues. Typically
lot of SUSPECT messages are exchanged, a lot of "GMS: address ..." items are
logged on standard output, the number of view accepted events dramatically increases.
As an example, looking at the number of viewaccepted (grep -c viewAccepted
*/logout.log):
logout.log.2009-03-25:6
logout.log.2009-03-26:51
logout.log.2009-03-27:49
logout.log.2009-03-28:0
logout.log.2009-03-29:2290
logout.log.2009-03-30:64
logout.log.2009-03-31:55
logout.log.2009-04-01:15
logout.log.2009-04-02:433
logout.log.2009-04-03:32
logout.log.2009-04-04:4
logout.log.2009-04-05:5
logout.log.2009-04-06:38
logout.log.2009-04-07:26
logout.log.2009-04-08:30
logout.log.2009-04-09:19
logout.log.2009-04-10:32
logout.log.2009-04-11:5
logout.log.2009-04-12:7
logout.log.2009-04-13:2236
logout.log.2009-04-14:56
We performed several test campaigns sending and receiving messages during a 2 or 3 dyas
period and checking for message loss but everything went right. Until the problems appears
again. No network issue was detected by our system administrator.
Another typical problem is that members send NOT_MEMBER messages causing stacks to
shutdown (should I say channels to close?). [ Received NOT_MEMBER event from null I'm
being shunned; exiting]. The shun option is not set (neither Channel with auto-reconnect
option set) and nevertheless in some cases the stack starts up again (CloserThread -
reconnecting to group ...)and in other cases not. Please note that when the stack does not
start up automatically, it is impossible to connect to the channel manually (we always
receive ChannelClosedException)
Typically
[sip@bipro tmusadmin]$ grep -c NOT_MEMBER jgroup.log*
jgroup.log:0
jgroup.log.2009-03-30:3
jgroup.log.2009-03-31:0
jgroup.log.2009-04-01:0
jgroup.log.2009-04-02:1370
jgroup.log.2009-04-07:0
jgroup.log.2009-04-10:0
jgroup.log.2009-04-11:11
jgroup.log.2009-04-12:9
jgroup.log.2009-04-13:587
jgroup.log.2009-04-14:0
A suggestion would be greatly appreciated.
Sory for the size of the logs!
was:
We are using jgroups as a notification system between webapps running inside tomcat or
weblogic server. In our current test platform all cluster nodes are on the same host, most
of them on the same container (tomcat). Some web-applictions may have several connections
to the cluster.
We use UDP multicast on a LAN, the configuration is nearly the default one.
The system seems to work fine but regularly we have cluster stability issues. Typically
lot of SUSPECT messages are exchanged, a lot of "GMS: address ..." items are
logged on standard output, the number of view accepted events dramatically increases.
As an example:
logout.log.2009-03-25:6
logout.log.2009-03-26:51
logout.log.2009-03-27:49
logout.log.2009-03-28:0
logout.log.2009-03-29:2290
logout.log.2009-03-30:64
logout.log.2009-03-31:55
logout.log.2009-04-01:15
logout.log.2009-04-02:433
logout.log.2009-04-03:32
logout.log.2009-04-04:4
logout.log.2009-04-05:5
logout.log.2009-04-06:38
logout.log.2009-04-07:26
logout.log.2009-04-08:30
logout.log.2009-04-09:19
logout.log.2009-04-10:32
logout.log.2009-04-11:5
logout.log.2009-04-12:7
logout.log.2009-04-13:2236
logout.log.2009-04-14:56
We performed several test campaigns sending and receiving messages during a 2 or 3 dyas
period and checking for message loss but everything went right. Until the problems appears
again. No network issue was detected by our system administrator.
Another typical problem is that members send NOT_MEMBER messages causing stacks to
shutdown (should I say channels to close?). [ Received NOT_MEMBER event from null I'm
being shunned; exiting]. The shun option is not set (neither Channel with auto-reconnect
option set) and nevertheless in some cases the stack starts up again (CloserThread -
reconnecting to group ...)and in other cases not. Please note that when the stack does not
start up automatically, it is impossible to connect to the channel manually (we always
receive ChannelClosedException)
Typically
[sip@bipro tmusadmin]$ grep -c NOT_MEMBER jgroup.log*
jgroup.log:0
jgroup.log.2009-03-30:3
jgroup.log.2009-03-31:0
jgroup.log.2009-04-01:0
jgroup.log.2009-04-02:1370
jgroup.log.2009-04-07:0
jgroup.log.2009-04-10:0
jgroup.log.2009-04-11:11
jgroup.log.2009-04-12:9
jgroup.log.2009-04-13:587
jgroup.log.2009-04-14:0
A suggestion would be greatly appreciated.
Sory for the size of the logs!
Intermittent cluster stability issues
-------------------------------------
Key: JGRP-957
URL:
https://jira.jboss.org/jira/browse/JGRP-957
Project: JGroups
Issue Type: Bug
Affects Versions: 2.7
Environment: jdk 1.5
Reporter: a C
Assignee: Bela Ban
Fix For: 2.8
Attachments: jgroups-logs.zip
We are using jgroups as a notification system between webapps running inside tomcat or
weblogic server. In our current test platform all cluster nodes are on the same host, most
of them on the same container (tomcat). Some web-applictions may have several connections
to the cluster.
We use UDP multicast on a LAN, the configuration is nearly the default one.
The system seems to work fine but regularly we have cluster stability issues. Typically
lot of SUSPECT messages are exchanged, a lot of "GMS: address ..." items are
logged on standard output, the number of view accepted events dramatically increases.
As an example, looking at the number of viewaccepted (grep -c viewAccepted
*/logout.log):
logout.log.2009-03-25:6
logout.log.2009-03-26:51
logout.log.2009-03-27:49
logout.log.2009-03-28:0
logout.log.2009-03-29:2290
logout.log.2009-03-30:64
logout.log.2009-03-31:55
logout.log.2009-04-01:15
logout.log.2009-04-02:433
logout.log.2009-04-03:32
logout.log.2009-04-04:4
logout.log.2009-04-05:5
logout.log.2009-04-06:38
logout.log.2009-04-07:26
logout.log.2009-04-08:30
logout.log.2009-04-09:19
logout.log.2009-04-10:32
logout.log.2009-04-11:5
logout.log.2009-04-12:7
logout.log.2009-04-13:2236
logout.log.2009-04-14:56
We performed several test campaigns sending and receiving messages during a 2 or 3 dyas
period and checking for message loss but everything went right. Until the problems appears
again. No network issue was detected by our system administrator.
Another typical problem is that members send NOT_MEMBER messages causing stacks to
shutdown (should I say channels to close?). [ Received NOT_MEMBER event from null I'm
being shunned; exiting]. The shun option is not set (neither Channel with auto-reconnect
option set) and nevertheless in some cases the stack starts up again (CloserThread -
reconnecting to group ...)and in other cases not. Please note that when the stack does not
start up automatically, it is impossible to connect to the channel manually (we always
receive ChannelClosedException)
Typically
[sip@bipro tmusadmin]$ grep -c NOT_MEMBER jgroup.log*
jgroup.log:0
jgroup.log.2009-03-30:3
jgroup.log.2009-03-31:0
jgroup.log.2009-04-01:0
jgroup.log.2009-04-02:1370
jgroup.log.2009-04-07:0
jgroup.log.2009-04-10:0
jgroup.log.2009-04-11:11
jgroup.log.2009-04-12:9
jgroup.log.2009-04-13:587
jgroup.log.2009-04-14:0
A suggestion would be greatly appreciated.
Sory for the size of the logs!
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira