[
https://issues.jboss.org/browse/JGRP-2159?page=com.atlassian.jira.plugin....
]
Bela Ban commented on JGRP-2159:
--------------------------------
Here's how this can be reproduced (unit test: {{DeltaViewTest}}):
* J is the coordinator and has view J|0=K
* K joins and sends a JOIN-REQ to J
* J creates new view J|1=J,K (setting {{ltime}} to 1) and multicasts it, but the multicast
is delayed (e.g. dropped and retransmitted)
* Finally, J sends a JOIN-RSP with view J|1 to K
* Before receiving the new view, K times out and sends another JOIN-REQ to J
* K receives view J|1 and installs it
* J creates a new view J|2=J,K (setting {{ltime}} to 2) and multicasts it. The multicast
is again delayed.
* J sends a JOIN-RSP to K with view J|2, K installs it
* J finally gets the first view multicast and installs J|1=J,K
* New member L sends a JOIN-RSP to J
* J creates view J|3=JKL and multicasts it, and then sends a JOIN-RSP to L
* The multicast of J|3 is a *DeltaView* with ref-view-id=J|1 and joiners=L
* L installs the new view
* J installs the new view J|3
* However, K cannot install the new view as ref-view-id=J|1 is not known as it has
view=J|2!
SOLUTION:
* The reason why spurious view J|2 is sent to K is that J hasn't yet installed view
J|1 locally. If that was the case, it would see that K is already a member and simply
resend view J|1, instead of creating view J|2.
* We therefore need to make sure a new view is installed in the coordinator *before*
multicasting it, and this can be done by setting {{install_view_locally_first}} to true by
default (or even removing the attribute)
* As a second line of defense, make the recepient of a DeltaView that cannot be installed
send a request to the coordinator to resend the view as a full- instead of a delta- view.
* J creates new view J|3=JL
Delta view cannot be installed
------------------------------
Key: JGRP-2159
URL:
https://issues.jboss.org/browse/JGRP-2159
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 4.1, 4.0.1
Attachments: discarded_delta_view.log
A DeltaView cannot be installed because the ref view-id is not the current view-id.
Looking at the view sequence for members J, K and L:
{noformat}
19:22:54,278 DEBUG (testng-Test:[]) [GMS] J: installing view [J|0] (1) [J]
19:22:56,519 DEBUG (testng-Test:[]) [GMS] K: installing view [J|1] (2) [J, K]
19:22:56,572 DEBUG (jgroups-7,J:[]) [GMS] J: installing view [J|1] (2) [J, K]
19:22:56,590 DEBUG (jgroups-5,K:[]) [GMS] K: installing view [J|2] (2) [J, K]
19:22:58,585 DEBUG (jgroups-5,J:[]) [GMS] J: installing view [J|3] (3) [J, K, L]
19:23:00,603 DEBUG (testng-Test:[]) [GMS] L: installing view [J|3] (3) [J, K, L]
{noformat}
K cannot install DeltaView J|3 because it has view J|2 but the DeltaView has ref view-id
J|1.
The reason is that J|2 was apparently installed *only* at K (but not at coordinator J1!),
despite it being the same view as J|1.
We need to look into why J|2 was installed at K only. Second line of defense: when a
DeltaView cannot be installed, send a message to the view sender (coord) and solicit the
full view instead.
See the attached log.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)