[jboss-jira] [JBoss JIRA] Created: (JGRP-551) FD does not restart monitor task after unsuspect event
Rodrigo Faria (JIRA)
jira-events at lists.jboss.org
Tue Jul 10 06:43:31 EDT 2007
FD does not restart monitor task after unsuspect event
------------------------------------------------------
Key: JGRP-551
URL: http://jira.jboss.com/jira/browse/JGRP-551
Project: JGroups
Issue Type: Bug
Affects Versions: 2.4.1 SP3, 2.5
Environment: Found in JGroups 2.4.1SP3 and 2.5RC1. Used Windows XP but seems to be plataform independent.
Reporter: Rodrigo Faria
Assigned To: Bela Ban
I reproduced this with 3 members, but may be possible with only 2. FD and VERIFY_SUSPECT must be present in the configuration. The steps to reproduce are the following:
- jgroups with 3 members online
- disconnect a member (not the coord)
- wait until the member disconnected suspects the other two (FD will generate the suspect event for both), but before it changes its view (before VERIFY_SUSPECT confirms the suspection), and reconnect it.
When both suspection occured, FD will have stopped its monitor task (since it had no pingable members). When the unsuspect event is generated, the FD will not restart its monitor task. As a consequence of this, if the other members removed this member from their view, this member will not be shunned (assuming shun=true in FD), since FD is not sending heartbeat request . This member's FD also will not be able to identify any failure, since its monitor task is stopped (I think it will be restarted only if something triggers a VIEW_CHANGE).
I tried to change the unsuspect method in FD to update the pingable_members and ping_dest and restart the monitor task (something like the implementation for processing a VIEW_CHANGE event) and it seemed to correct this problem.
I also noticed ping_dest is not being sychronized in the monitor task. Instead of using a synchronized block (to prevent a bottleneck), I think it should be copied to a local variable so it is thread_safe (would prevent checking one member and suspecting another because the ping_dest changed). I did not reproduced this, I just noticed it looking at the source code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list