[jboss-jira] [JBoss JIRA] (JGRP-1389) Synchronous messages

Bela Ban (Commented) (JIRA) jira-events at lists.jboss.org
Mon Dec 5 12:03:40 EST 2011


    [ https://issues.jboss.org/browse/JGRP-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648096#comment-12648096 ] 

Bela Ban commented on JGRP-1389:
--------------------------------

I don't think we need to change NAKACK or UNICAST2 ! 

By simply resending the RSVP request (either unicast or multicast) while waiting on all RSVP responses, we make NAKACK or UNICAST2 trigger retransmission on the receiver's side. Example: if we send 1 2 3 4 5, and the last message (5) is dropped, then resending the RSVP request will make it #6. If some receiver received 1 2 3 4, and then (finally) 6, it'll ask the sender to retransmit #5.

This is a simple and beautiful solution, as it does *not* requires any changes to NAKACK or UNICAST2 !

Note that - if we use UNICAST (which does sender retransmission) - then those retransmissions are spurious, but in that case, we could *disable* RSVP-triggered retransmission ! E.g. we could see which unicast or multicast retransmission protocol we have and disable retransmission in RSVP entirely.

The check could be via a simple event sent down the stack, which inquires whether the unicast or multicast protocol is ack (UNICAST) or nak (UNICAST2, NAKACK) based.

                
> Synchronous messages
> --------------------
>
>                 Key: JGRP-1389
>                 URL: https://issues.jboss.org/browse/JGRP-1389
>             Project: JGroups
>          Issue Type: Feature Request
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.1
>
>
> FLUSH ensures that every member has received all messages from *all* other members. This is quite costly, especially if we have a large cluster.
> However, sometimes, it is only necessary to ensure that a given message M sent by P has been received by everyone, for example, if P is the coordinator and installs view V, then it would be sufficient to ensure that everyone received V.
> So synchronous messages are messages that block the sender until all of the (non-faulty) recipients have ack'ed their reception, or a timeout occurs.
> Note that P when sending messages 5,6,7,8, and only tagging 8 as synchronous, when returning from sending 8, JGroups guarantees that everyone will also have received all messages from P lower than 8.
> A user should be able to configure whether the message send is complete when all receipients have *received* the message, or when they have *delivered* it.
> Reception of a message means that the message was added to the receipient's buffer, delivery means that the message was consumed by the application.
> A message M4 sent by P is sometimes delivered late because M is the last message sent by P, and if members Q and R don't receive M4 (e.g. because it was dropped), and P doesn't send another message M5, then Q and R have no means of detecting that M4 was sent by P and thus ask P for retransmission of M4.
> This is of course solved by STABLE which periodically broadcasts the highest seqnos sent, and then Q and R can ask P for retransmission of M4.
> However, if we don't want to wait that long, and don't want to risk Q and R to leave before they've received M4, we can implement a *sender-flush* of M4 sent by P.
> This works as follows:
> - P can add a RSVP flag to M4
> - P adds M4 to a retransmit table and keeps retransmitting M4 until it has received ACKs from every non-faulty member in the target set, or P leaves
> - When a message is received that is tagged with RSVP, an ACK is sent back to the sender
> - When P has received ACKs from everyone, the message send returns. The caller is blocked until this is the case (maybe bound with a timeout ?)
> The reason why this works is that every receiver gets the latest message M4 from P, and adds it to its retransmit table. If there is a gap, it will ask P for retransmission of missing messages. This way, a receiver won't have to wait until STABLE sends a digest to find out it's missing messages from P.
> There could be an option for a receiver R to delay sending an ACK back to P until it has actually *received* P's missing messages. If this isn't the case, P could leave or crash *before* R got all of P's missing messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list