[jboss-jira] [JBoss JIRA] (JGRP-1742) BARRIER: minimize closing time

Thu Nov 14 07:03:05 EST 2013

    [ https://issues.jboss.org/browse/JGRP-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923552#comment-12923552 ] 

Bela Ban commented on JGRP-1742:
--------------------------------

Opening BARRIER after pending thread have been flushed and the digest has been retrieved is OK, but we *cannot resume stability*. If we resumed stability, senders might not be able to serve retransmission requests during a long running state transfer as stability might already have purged older messages !

> BARRIER: minimize closing time
> ------------------------------
>
>                 Key: JGRP-1742
>                 URL: https://issues.jboss.org/browse/JGRP-1742
>             Project: JGroups
>          Issue Type: Enhancement
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.5
>
>
> During a state transfer, BARRIER.up() waits until all incoming threads (delivering messages to the application) are done, and blocks further incoming messages. This is done to get the digest and the state.
> However, duing the block, the following messages are not sent up:
> * Views !
> * STABLE messages, triggering retransmissions
> This is bad, so we should try to minimize the time BARRIER is closed. This can be done with JGRP-1352. 
> However, we could also do the following:
> * A state request is received
> * Close BARRIER and flush all pending threads. This ensures that any message which updated the *digest* also updated the *application state*
> * Get the digest D
> * *Open* BARRIER. Messages will now be delivered and thus applied to the state
> * Get the application state S
> * When done, return D and S to the state requester
> The difference to JGRP-1352 is that we don't queue messages during state transfer. How does this work ? It is critical to ensure that all mesages which updated the digest D also updated the state S, or else messages present in D but not in S would not be retransmitted. However, if there are more messages in S than in D, this is not an issue as they will be retransmitted again.
> Example:
> * BARRIER is closed and pending threads are flushed
> * Digest D is (only for a given member P) 5, state S is 5 as well
> * Now we open BARRIER
> * P sends a few more messages (6, 7 and 8)
> * The digest is now 8, but the copy we have is still 5
> * State S is 8
> * We return D=5 and S=8
> * The state requester closes BARRIER and sets its digest to 5 and its state to 8
> * Since the digest is only 5 for P, the state requester asks P for retransmission of messages 6, 7 and 8
> * Messages 6, 7 and 8 from P are received and applied to the state
> * The assumption here is that if messages 6, 7 and 8 are applied twice, the state doesn't change (idempotency). This should be the case with Infinispan.
> The advantage of this issue over JGRP-1352 is that we don't need to queue messages for a long time if the state is large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira