[jboss-jira] [JBoss JIRA] (JGRP-1493) Merge fails because failing to get physical address takes too long
David Hotham (JIRA)
jira-events at lists.jboss.org
Thu Aug 30 12:22:32 EDT 2012
[ https://issues.jboss.org/browse/JGRP-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714921#comment-12714921 ]
David Hotham commented on JGRP-1493:
------------------------------------
I'm not completely convinced that FD_ALL will necessarily help either. Eg if D's view had been {B', B, D, A, C'} then FD_ALL at D would never send SUSPECT events at all (because D would never be the first of the eligible members).
> Merge fails because failing to get physical address takes too long
> ------------------------------------------------------------------
>
> Key: JGRP-1493
> URL: https://issues.jboss.org/browse/JGRP-1493
> Project: JGroups
> Issue Type: Feature Request
> Affects Versions: 3.1
> Reporter: David Hotham
> Assignee: Bela Ban
> Fix For: 3.2
>
>
> Start with the following views:
> - A, B and C all have {A,B,C}
> - D has {B', D, A, C'}, where B' and C' are dead.
> A decides to lead a merge (he's the only 'actual' coordinator). By the time we've been through view-sanitization and so on and reached getMergeDataFromSubgroupCoordinators(), coords are {D, C', A}.
> Here A tries to send MERGE_REQ to those elements. However, A does not have a physical address for C', and in fact nor does anyone else. So when trying to send the MERGE_REQ to C', A will always spend a little over 5 seconds in TP.sendToSingleMember() - trying and failing to discover that physical address.
> Of course A won't get a response from C' either, so it will take another 5 seconds for merge_rsps.waitForAllResponses to time out.
> But that means that it's a sure thing that the MergeKiller will kick in first.
> Therefore the merge can never progress.
> (Presumably the situation would be even worse if D's view had contained further dead members).
> I expect to work around this by tweaking the timings somewhere: probably in startMergeKiller, so that the MergeKiller takes longer to be scheduled.
> I'd think that the right fix would be to arrange that the MergeTask is not blocked by TP having no physical address for a member.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list