FD: messages from members other than ping_dest causes missing-heartbeat count to be reset
-----------------------------------------------------------------------------------------
Key: JGRP-746
URL:
http://jira.jboss.com/jira/browse/JGRP-746
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assigned To: Bela Ban
Priority: Critical
Fix For: 2.6.3, 2.7
[email from John Smith]
I'm not sure FD is behaving like it should.
I started a group with two members. I then suspended one instance with a kill -SIGSTOP.
After a while I expected the FD protocol to suspect the suspended jvm but it did not do
it.
I looked at FD code and it seems like messages that do not come from ping_dest reset
num_tries and thus prevent the member from being suspected. Is this intended? Why would a
message from self reset num_tries?
I'm using jgroups 2.6.2.
Here is and the relevant part of the jgroups logs:
10:10:48,291 DEBUG [FD] sending are-you-alive msg to 192.168.128.105:47870 (own
address=192.168.128.129:57685)
10:10:48,291 DEBUG [FD] heartbeat missing from 192.168.128.105:47870
(number=0)
10:11:18,293 DEBUG [FD] sending are-you-alive msg to 192.168.128.105:47870 (own
address=192.168.128.129:57685)
10:11:18,293 DEBUG [FD] heartbeat missing from 192.168.128.105:47870
(number=1)
10:11:48,294 DEBUG [FD] sending are-you-alive msg to 192.168.128.105:47870 (own
address=192.168.128.129:57685)
10:11:48,294 DEBUG [FD] heartbeat missing from 192.168.128.105:47870
(number=2)
10:12:18,296 DEBUG [FD] sending are-you-alive msg to 192.168.128.105:47870 (own
address=192.168.128.129:57685)
10:12:18,296 DEBUG [FD] heartbeat missing from 192.168.128.105:47870
(number=3)
10:12:48,299 DEBUG [FD] sending are-you-alive msg to 192.168.128.105:47870 (own
address=192.168.128.129:57685)
10:12:48,299 DEBUG [FD] heartbeat missing from 192.168.128.105:47870
(number=4)
10:12:51,265 DEBUG [FD] received msg from 192.168.128.129:57685 (counts as ack)
10:13:18,300 DEBUG [FD] sending are-you-alive msg to 192.168.128.105:47870 (own
address=192.168.128.129:57685)
10:13:19,336 DEBUG [FD] received msg from 192.168.128.129:57685 (counts as ack)
10:13:45,988 DEBUG [FD] received msg from 192.168.128.129:57685 (counts as ack)
10:13:48,302 DEBUG [FD] sending are-you-alive msg to 192.168.128.105:47870 (own
address=192.168.128.129:57685)
10:14:18,303 DEBUG [FD] sending are-you-alive msg to 192.168.128.105:47870 (own
address=192.168.128.129:57685)
10:14:18,303 DEBUG [FD] heartbeat missing from 192.168.128.105:47870
(number=0)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira