]
Bela Ban updated JGRP-1486:
---------------------------
Fix Version/s: 3.1
Merge failure when dead instances remain in view
------------------------------------------------
Key: JGRP-1486
URL:
https://issues.jboss.org/browse/JGRP-1486
Project: JGroups
Issue Type: Bug
Affects Versions: 3.0.10
Reporter: David Hotham
Assignee: Bela Ban
Fix For: 3.1
I've hit this testing my JGRP-1485 fix, but I think it's a logically independent
issue.
So, I've reached a point where:
- A, B and C all have view {C,A,B}
- D has view {B', D', D, A', C}, in which B', D' and A' are all
dead instances
As in JGRP-1485, an optimal fix would surely be to allow D to recover all by itself, but
it's not clear to me how to do that. However, my expectation was that a merge should
sort things out; and I think that if it did then that ought to be good enough.
But what's actually happening is this:
- C becomes merge leader
- determines that merge participants are C, D', D, A'
- sends MERGE_REQ to those members
- the MERGE_REQ to D' reaches D (and that to A' reaches A)
- D sends a positive response for the MERGE_REQ that was meant for it, but after 2.5
seconds also sends a negative response to the MERGE_REQ meant for D'. (I think that
the negative response is because it can't fetch the digest from D')
- likewise A sends a negative response to the MERGE_REQ meant for A'
So what C sees is:
- good responses from C and D, followed by merge_rejected responses from A and D
- so it removes A' and D' from the merge (it didn't get responses from
them)
- then it removes D from the merge (because the most recent response from D said
merge_rejected)
- so it is left only with itself, and comes up with a consolidated view that is
identical to its original view
in short: the merge doesn't do anything useful after all.
I think that the key here is the confusion between D and D'. Possibly the fix is as
simple as: ignore MERGE_REQs where the destination address on the message is not the local
address.
I'll try this out and, if it looks good, submit a pull request.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: