FD does not restart monitor task after unsuspect event
------------------------------------------------------
Key: JGRP-551
URL:
http://jira.jboss.com/jira/browse/JGRP-551
Project: JGroups
Issue Type: Bug
Affects Versions: 2.4.1 SP3, 2.5
Environment: Found in JGroups 2.4.1SP3 and 2.5RC1. Used Windows XP but seems to
be plataform independent.
Reporter: Rodrigo Faria
Assigned To: Bela Ban
I reproduced this with 3 members, but may be possible with only 2. FD and VERIFY_SUSPECT
must be present in the configuration. The steps to reproduce are the following:
- jgroups with 3 members online
- disconnect a member (not the coord)
- wait until the member disconnected suspects the other two (FD will generate the suspect
event for both), but before it changes its view (before VERIFY_SUSPECT confirms the
suspection), and reconnect it.
When both suspection occured, FD will have stopped its monitor task (since it had no
pingable members). When the unsuspect event is generated, the FD will not restart its
monitor task. As a consequence of this, if the other members removed this member from
their view, this member will not be shunned (assuming shun=true in FD), since FD is not
sending heartbeat request . This member's FD also will not be able to identify any
failure, since its monitor task is stopped (I think it will be restarted only if something
triggers a VIEW_CHANGE).
I tried to change the unsuspect method in FD to update the pingable_members and ping_dest
and restart the monitor task (something like the implementation for processing a
VIEW_CHANGE event) and it seemed to correct this problem.
I also noticed ping_dest is not being sychronized in the monitor task. Instead of using a
synchronized block (to prevent a bottleneck), I think it should be copied to a local
variable so it is thread_safe (would prevent checking one member and suspecting another
because the ping_dest changed). I did not reproduced this, I just noticed it looking at
the source code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira