[
https://issues.jboss.org/browse/JGRP-1742?page=com.atlassian.jira.plugin....
]
Bela Ban commented on JGRP-1742:
--------------------------------
There are all sorts of problems when a BARRIER is closed and discards messages:
* JGRP-1751: a view received during state transfer is not installed
* The coordinator doesn't deliver its own view if BARRIER is closed. Only when
stability kicks in, will the coord receive its view. This is bad as new views won't
get created correctly
* When the BARRIER is closed, flushing of pending threads might take a long time
So let's think about what the purpose of BARRIER is:
# Provide a digest which corresponds to messages *applied to the state*
# It doesn *not* need to be active during state transfer (BARRIER can be opened after the
digest has been fetched)
Re 1: if a message P:25 was added to the digest, but not yet applied to the application,
it should be excluded from the digest so we can retransmit it and apply it to the state
later.
So perhaps a better idea is to *remove BARRIER altogether* and instead use an _application
digest_. This would be the digest of the highest message seqnos *applied* to the state.
There would be an aditional GET_APP_DIGEST event (implemented by NAKACK and NAKACK2) which
returns a digest of seqnos applied to the state. Contrary to the normal digest, this
digest is updated when a message has been *delivered* to the application ({{up(Event)}} or
{{up(MessageBatch)}} returned) rather than when the message is *received*.
Example:
* We have digest {{A:10,B:2,C:16}}
* Now a message batch {{B:3-10}} is received and a single message {{C:17}} is also
received
* The normal digest in NAKACK(2) is now {{A:10,B:10,C:17}}
* However, the delivery of the message batch hasn't completed yet. The {{up(message)}}
method for {{C:17}} has completed though
* The _application digest_ returned to the state requester is now {{A:10,B2,C:17}}:
** B's highest seqno is 2 because the message batch {{B:3-10}} hasn't returned
yet, so we don't know which messages in it have been delivered to the application (and
potentially caused state changes).
** {{C:17}} is part of the application digest as we know that it returned and therefore
any potential state change was performed
h5. Observations
* It is not important to know whether a message really changed the state or not. However,
it is important to know whether the message was delivered to the application or not. If
not sure (e.g. messages {{B3-7}} of the above message batch might have gotten delivered to
the application, but not messages {{B8-10}}), we return the highest seqno that we know was
delivered to the application. In the above case, messages {{B:3-10}} would get
retransmitted and applied to the state (even if 3-7 were already applied)
* As a consequence of the above, *messages need to be idempotent*, or else the state would
not be correct. In other words, the state should not be different regardless of whether
messages {{B:3-7}} get delivered once or multiple times.
* If the above is not possible, then {{FLUSH}} should be used to prevent any sending
during a state transfer
BARRIER: minimize closing time
------------------------------
Key: JGRP-1742
URL:
https://issues.jboss.org/browse/JGRP-1742
Project: JGroups
Issue Type: Enhancement
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.5
During a state transfer, BARRIER.up() waits until all incoming threads (delivering
messages to the application) are done, and blocks further incoming messages. This is done
to get the digest and the state.
However, duing the block, the following messages are not sent up:
* Views !
* STABLE messages, triggering retransmissions
This is bad, so we should try to minimize the time BARRIER is closed. This can be done
with JGRP-1352.
However, we could also do the following:
* A state request is received
* Close BARRIER and flush all pending threads. This ensures that any message which
updated the *digest* also updated the *application state*
* Get the digest D
* *Open* BARRIER. Messages will now be delivered and thus applied to the state
* Get the application state S
* When done, return D and S to the state requester
The difference to JGRP-1352 is that we don't queue messages during state transfer.
How does this work ? It is critical to ensure that all mesages which updated the digest D
also updated the state S, or else messages present in D but not in S would not be
retransmitted. However, if there are more messages in S than in D, this is not an issue as
they will be retransmitted again.
Example:
* BARRIER is closed and pending threads are flushed
* Digest D is (only for a given member P) 5, state S is 5 as well
* Now we open BARRIER
* P sends a few more messages (6, 7 and 8)
* The digest is now 8, but the copy we have is still 5
* State S is 8
* We return D=5 and S=8
* The state requester closes BARRIER and sets its digest to 5 and its state to 8
* Since the digest is only 5 for P, the state requester asks P for retransmission of
messages 6, 7 and 8
* Messages 6, 7 and 8 from P are received and applied to the state
* The assumption here is that if messages 6, 7 and 8 are applied twice, the state
doesn't change (idempotency). This should be the case with Infinispan.
The advantage of this issue over JGRP-1352 is that we don't need to queue messages
for a long time if the state is large.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:
http://www.atlassian.com/software/jira