David Hotham created JGRP-1493:
----------------------------------
Summary: Merge fails because failing to get physical address takes too long
Key: JGRP-1493
URL:
https://issues.jboss.org/browse/JGRP-1493
Project: JGroups
Issue Type: Feature Request
Affects Versions: 3.1
Reporter: David Hotham
Assignee: Bela Ban
Start with the following views:
- A, B and C all have {A,B,C}
- D has {B', D, A, C'}, where B' and C' are dead.
A decides to lead a merge (he's the only 'actual' coordinator). By the time
we've been through view-sanitization and so on and reached
getMergeDataFromSubgroupCoordinators(), coords are {D, C', A}.
Here A tries to send MERGE_REQ to those elements. However, A does not have a physical
address for C', and in fact nor does anyone else. So when trying to send the
MERGE_REQ to C', A will always spend a little over 5 seconds in
TP.sendToSingleMember() - trying and failing to discover that physical address.
Of course A won't get a response from C' either, so it will take another 5 seconds
for merge_rsps.waitForAllResponses to time out.
But that means that it's a sure thing that the MergeKiller will kick in first.
Therefore the merge can never progress.
(Presumably the situation would be even worse if there was more than one
I expect to work around this by tweaking the timings somewhere: probably in
startMergeKiller, so that the MergeKiller takes longer to be scheduled.
I'd think that the right fix would be to arrange that the MergeTask is not blocked by
TP having no physical address for a member.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira