[jboss-jira] [JBoss JIRA] (JGRP-1389) Synchronous messages

Bela Ban (Commented) (JIRA) jira-events at lists.jboss.org
Fri Dec 2 08:21:41 EST 2011


    [ https://issues.jboss.org/browse/JGRP-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647550#comment-12647550 ] 

Bela Ban commented on JGRP-1389:
--------------------------------

Yes, we'll use a separate protocol (RSVP): this *is* separate functionality, not directly tied to NAKACK or UNICAST2. This way, we can use it over UNICAST or even a stack without NAKACK or UNICAST2 !
Design:
- RSVP checks if the RSVP flag is set in a message
- If so, it creates a unique number (possibly a UUID), and adds it plus all recipients to a hashmap (maybe via AckCollector ?)
- Then it adds a header with the UUID and pass the message down
- A timer task is then started which periodically sends a LAST_SEQNO event down, telling the NAKACK or UNICAST2 protocols to send their highest seqno (sent or received) to a specific recipient (unicast) or all recipients. 
- The recipient(s) check the seqno sent by P against their own entry for P and ask P for retransmission if it's missing
- Then it blocks on AckCollector until a timeout occurs or all ack have been received
- When a timeout occurs, we can throw an exception or simply return (configurable)
- When RSVP receives a message with flag=RSVP, it sends an ack back to the sender. This means, we'll only ack messages that have been *delivered*, not when they're *received*
- We could first pass the message up, and *then* ack it, or ack it and then send the message. This should be configurable

                
> Synchronous messages
> --------------------
>
>                 Key: JGRP-1389
>                 URL: https://issues.jboss.org/browse/JGRP-1389
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.1
>
>
> FLUSH ensures that every member has received all messages from *all* other members. This is quite costly, especially if we have a large cluster.
> However, sometimes, it is only necessary to ensure that a given message M sent by P has been received by everyone, for example, if P is the coordinator and installs view V, then it would be sufficient to ensure that everyone received V.
> So synchronous messages are messages that block the sender until all of the (non-faulty) recipients have ack'ed their reception, or a timeout occurs.
> Note that P when sending messages 5,6,7,8, and only tagging 8 as synchronous, when returning from sending 8, JGroups guarantees that everyone will also have received all messages from P lower than 8.
> A user should be able to configure whether the message send is complete when all receipients have *received* the message, or when they have *delivered* it.
> Reception of a message means that the message was added to the receipient's buffer, delivery means that the message was consumed by the application.
> A message M4 sent by P is sometimes delivered late because M is the last message sent by P, and if members Q and R don't receive M4 (e.g. because it was dropped), and P doesn't send another message M5, then Q and R have no means of detecting that M4 was sent by P and thus ask P for retransmission of M4.
> This is of course solved by STABLE which periodically broadcasts the highest seqnos sent, and then Q and R can ask P for retransmission of M4.
> However, if we don't want to wait that long, and don't want to risk Q and R to leave before they've received M4, we can implement a *sender-flush* of M4 sent by P.
> This works as follows:
> - P can add a RSVP flag to M4
> - P adds M4 to a retransmit table and keeps retransmitting M4 until it has received ACKs from every non-faulty member in the target set, or P leaves
> - When a message is received that is tagged with RSVP, an ACK is sent back to the sender
> - When P has received ACKs from everyone, the message send returns. The caller is blocked until this is the case (maybe bound with a timeout ?)
> The reason why this works is that every receiver gets the latest message M4 from P, and adds it to its retransmit table. If there is a gap, it will ask P for retransmission of missing messages. This way, a receiver won't have to wait until STABLE sends a digest to find out it's missing messages from P.
> There could be an option for a receiver R to delay sending an ACK back to P until it has actually *received* P's missing messages. If this isn't the case, P could leave or crash *before* R got all of P's missing messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list