New subject: [JBoss JIRA] Updated: (JGRP-957) Intermittent cluster stability issues

Wednesday, 15 April 2009

Intermittent cluster stability issues
-------------------------------------

                 Key: JGRP-957
                 URL: https://jira.jboss.org/jira/browse/JGRP-957
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 2.7
         Environment: jdk 1.5
            Reporter: a C
            Assignee: Bela Ban

We are using jgroups as a notification system between webapps running inside tomcat or
weblogic server. In our current test platform all cluster nodes are on the same host, most
of them on the same container (tomcat). Some web-applictions may have several connections
to the cluster.
We use UDP multicast on a LAN, the configuration is nearly the default one.

The system seems to work fine but regularly  we have cluster stability issues. Typically
lot of SUSPECT messages are exchanged, a lot of "GMS: address ..."  items are
logged on standard output, the number of view accepted events dramatically increases.

As an example:
logout.log.2009-03-25:6
logout.log.2009-03-26:51
logout.log.2009-03-27:49
logout.log.2009-03-28:0
logout.log.2009-03-29:2290
logout.log.2009-03-30:64
logout.log.2009-03-31:55
logout.log.2009-04-01:15
logout.log.2009-04-02:433
logout.log.2009-04-03:32
logout.log.2009-04-04:4
logout.log.2009-04-05:5
logout.log.2009-04-06:38
logout.log.2009-04-07:26
logout.log.2009-04-08:30
logout.log.2009-04-09:19
logout.log.2009-04-10:32
logout.log.2009-04-11:5
logout.log.2009-04-12:7
logout.log.2009-04-13:2236
logout.log.2009-04-14:56

We performed several test campaigns sending and receiving messages during a 2 or 3 dyas
period and checking for message loss but everything went right. Until the problems appears
again. No network issue was detected by our system administrator.

Another typical problem is that members send NOT_MEMBER messages causing stacks to
shutdown (should I say channels to close?). [ Received NOT_MEMBER event from null I'm
being shunned; exiting]. The shun option is not set (neither Channel with auto-reconnect
option set) and nevertheless in some cases the stack starts up again (CloserThread -
reconnecting to group ...)and in other cases not. Please note that when the stack does not
start up automatically, it is impossible to connect to the channel manually (we always
receive ChannelClosedException)

Typically
[sip@bipro tmusadmin]$ grep -c NOT_MEMBER jgroup.log*
jgroup.log:0
jgroup.log.2009-03-30:3
jgroup.log.2009-03-31:0
jgroup.log.2009-04-01:0
jgroup.log.2009-04-02:1370
jgroup.log.2009-04-07:0
jgroup.log.2009-04-10:0
jgroup.log.2009-04-11:11
jgroup.log.2009-04-12:9
jgroup.log.2009-04-13:587
jgroup.log.2009-04-14:0

A suggestion would be greatly appreciated.

Sory for the size of the logs!

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBoss JIRA] Created: (JGRP-957) Intermittent cluster stability issues