]
Vladimir Blagojevic resolved JGRP-598.
--------------------------------------
Resolution: Done
Resolved on 2.6 branch.
Commit set:
FLUSH.java 1.76
Need to be back ported to 2.5.
Simplify FLUSH
--------------
Key: JGRP-598
URL:
http://jira.jboss.com/jira/browse/JGRP-598
Project: JGroups
Issue Type: Task
Affects Versions: 2.4, 2.5
Reporter: Vladimir Blagojevic
Assigned To: Vladimir Blagojevic
Fix For: 2.6
#1 Superfluous FLUSH-COMPLETED phase ?
--------------------------------------
It seems we don't need the FLUSH-COMPLETED phase, it is sufficient for
the FLUSH leader to do a START-FLUSH and wait for all FLUSH-OK
messages. Why ? Because the reconciliation protocol will reconcile
messages before the view installation (or state transfer).
Example: {A,B,C}. E sends message M and then crashes. Only C receives
M.
- A does a START-FLUSH
- B and C send a FLUSH-OK back to A. This means that all multicasts
from B will be seen by A *before* B's FLUSH-OK is received, and all
multicasts from C will be seen by A before C's FLUSH-OK is received by
A. HOWEVER, C will not necessarily send M to A before sending the
FLUSH-OK message. This is only done through the reconciliation
protocol !
- FLUSH-COMPLETED can be removed, the digest that's part of
FLUSH-COMPLETED has to moved to FKLUSH-OK phase.
- FLUSH-OK is unicast *not* multicast.
#2 Superfluous STOP-FLUSH-OK phase ?
------------------------------------
Why do we need the acks here ? JGroups guarantees the STOP-FLUSH
message is delivered to all non-faulty members, so why wait fr the
STOP-FLUSH-OK ?
If we make STOP-FLUSH *OOB* (JGRP-337), then this should work ! We can
get rid of STOP-FLUSH-OK !! Just send STOP-FLUSH as OOB multicast.
Note: this requires the concurrent stack, so possibly check for
concurrent stack. Throw an exception if concurrent stack is not
present when FLUSH is used. So, these changes cannot be applied to
2.4, only to 2.5 and 2.6.
#3 Timeouts
-----------
- ClientGmsImpl: has a hard coded timeout of 4 secs
- CoordGmsImpl also has a timeout for block()
- If we *need* timeouts, we should get them from the FLUSH protocol !
CoorsGmsImpl.startFlush() should *not* have a timeout, this should be
in FLUSH itself. startFlush() should block forever, until it gets the ack
#4 Spurious FLUSH-OK in the diagram
-----------------------------------
In the JOIN diagram, probably also in the state transfer diagram
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: