[jboss-jira] [JBoss JIRA] (JGRP-1742) BARRIER: minimize closing time

Thu Nov 21 04:59:06 EST 2013

    [ https://issues.jboss.org/browse/JGRP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925646#comment-12925646 ] 

Bela Ban edited comment on JGRP-1742 at 11/21/13 4:57 AM:
----------------------------------------------------------

h5. Update

h6. Serialization of state requests
* State transfer requests should be handled *sequentially* at the state provider
** This ensures that a BARRIER that is closed by a new state transfer is not accidentally openend by a completed state transfer

h6. State provider
* Close BARRIER (see below) only for fetching of digest
* When blocked, punch a whole (PUNCH_HOLE) for *unicasts from the state requester* and pass messages tagged with INTERNAL | OOB
** PUNCH_HOLE is needed e.g. for UNICAST3 ACKs or retransmissions
* When digest has been fetched, open BARRIER again
** (Leave hole for state requester open, as a different state request might block BARRIER again, before state request has completed)
* When state transfer to the state requester has completed, remove hole for state requester (CLOSE_HOLE)

h6. State requester
* Close BARRIER to set digest *and apply state*
* Punch hole for *state provider*
** This is not critical as the state requester can block for some time, before becoming operational ({{JChannel.connect()}} returns)
* When the state transfer has completed, remove the hole for the state provider again and open BARRIER

h6. Closing of BARRIER / flushing of pending requests
* CLOSE_BARRIER can return a boolean (or throw an exception?) if it wasn't able to flush all pending threads
** If false, the state transfer fails and an exception is sent to the state requester
* Flushing of requests might fail:
** Requests might be executed as newly spawned threads. This means we'll return immediately, but we're not guaranteed that the state has been modified
*** Possible solution: use {{Receiver.block()/unblock()}}: when a {{block()}} is received, the application has to make sure that all pending requests that modify application state have completed

h6. Misc
* Should we discard traffic when blocked or queue it ?
** Discarding means we'll retransmit, but we might have to kick off a stability round to handle the last-message-lost problem
** Queueing requires keeping track of how many bytes have been added to the queue and ensuring the boundedness of it. We might also overwhelm the thread pools by passing a lot of messages and message batches to them on opening BARRIER

      was (Author: belaban):
    h5. Update

h6. Serialization of state requests
* State transfer requests should be handled *sequentially* at the state provider
** This ensures that a BARRIER that is closed by a new state transfer is not accidentally openend by a completed state transfer

h6. State provider
* Close BARRIER (see below) only for fetching of digest
* When blocked, punch a whole (PUNCH_HOLE) for *unicasts from the state requester* and pass messages tagged with INTERNAL | OOB
** PUNCH_HOLE is needed e.g. for UNICAST3 ACKs or retransmissions
* When digest has been fetched, open BARRIER again
** (Leave hole for state requester open, as a different state request might block BARRIER again, before state request has completed)
* When state transfer to the state requester has completed, remove hole for state requester (CLOSE_HOLE)

h6. State requester
* Close BARRIER to set digest *and apply state*
* Punch hole for *state provider*
** This is not critical as the state requester can block for some time, before becoming operational ({{JChannel.connect()}} returns)
* When the state transfer has completed, remove the hole for the state provider again and open BARRIER

h6. Closing of BARRIER / flushing of pending requests
* CLOSE_BARRIER can return a boolean (or throw an exception?) if it wasn't able to flush all pending threads
** If false, the state transfer fails and an exception is sent to the state requester
* Flushing of requests might fail:
** Requests might be executed as newly spawned threads. This means we'll return immediately, but we're not guaranteed that the state has been modified
*** Possible solution: use {{Receiver.block()/unblock()}}: when a {{block()}} is received, the application has to make sure that all pending requests that modify application state have completed

> BARRIER: minimize closing time
> ------------------------------
>
>                 Key: JGRP-1742
>                 URL: https://issues.jboss.org/browse/JGRP-1742
>             Project: JGroups
>          Issue Type: Enhancement
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.5
>
>
> During a state transfer, BARRIER.up() waits until all incoming threads (delivering messages to the application) are done, and blocks further incoming messages. This is done to get the digest and the state.
> However, duing the block, the following messages are not sent up:
> * Views !
> * STABLE messages, triggering retransmissions
> This is bad, so we should try to minimize the time BARRIER is closed. This can be done with JGRP-1352. 
> However, we could also do the following:
> * A state request is received
> * Close BARRIER and flush all pending threads. This ensures that any message which updated the *digest* also updated the *application state*
> * Get the digest D
> * *Open* BARRIER. Messages will now be delivered and thus applied to the state
> * Get the application state S
> * When done, return D and S to the state requester
> The difference to JGRP-1352 is that we don't queue messages during state transfer. How does this work ? It is critical to ensure that all mesages which updated the digest D also updated the state S, or else messages present in D but not in S would not be retransmitted. However, if there are more messages in S than in D, this is not an issue as they will be retransmitted again.
> Example:
> * BARRIER is closed and pending threads are flushed
> * Digest D is (only for a given member P) 5, state S is 5 as well
> * Now we open BARRIER
> * P sends a few more messages (6, 7 and 8)
> * The digest is now 8, but the copy we have is still 5
> * State S is 8
> * We return D=5 and S=8
> * The state requester closes BARRIER and sets its digest to 5 and its state to 8
> * Since the digest is only 5 for P, the state requester asks P for retransmission of messages 6, 7 and 8
> * Messages 6, 7 and 8 from P are received and applied to the state
> * The assumption here is that if messages 6, 7 and 8 are applied twice, the state doesn't change (idempotency). This should be the case with Infinispan.
> The advantage of this issue over JGRP-1352 is that we don't need to queue messages for a long time if the state is large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira