[jboss-jira] [JBoss JIRA] (JGRP-1875) UNICAST3/UNICAST2: sync receiver table with sender table

Bela Ban (JIRA) issues at jboss.org
Tue Sep 9 07:57:00 EDT 2014


    [ https://issues.jboss.org/browse/JGRP-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999397#comment-12999397 ] 

Bela Ban edited comment on JGRP-1875 at 9/9/14 7:56 AM:
--------------------------------------------------------

Hmm, there's a problem when a new connection is established:
* A (conn-id=1) sends A1(first)-A10 to B
* B receives A:3 first. Since this isn't tagged as {{first}}, B sends a SYNC to A
* A hasn't seen that SYNC yet, so it changes its {{conn-id}} to 3
* B now receives A:1(first) and creates the connection with {{conn-id}} == 2
* B also receives the other messages and adds them to the connections recv-window
* Now B receives the {{SYNC-OK}} message with {{conn-id}} == 3
** B now removes the recv-window for A and creates a new one with {{conn-id}} == 3

This race between the first message and SYNC happens because we cannot guarantee that the first message from A is always received _first_ by B.

I'm starting to dislike this whole SYNC business: we're fixing an edge case that happens 0.001% of the time, but introduce a complexity that's not warranted !

So what are the alternatives ?

h5. Do nothing (status quo) / first message creates the recv-window
* With {{conn-close-timeout}} set to a reasonably large value, JGRP-1873 can almost never happen
* The first message creates the recv-window
** If we get some messages before the first, they're dropped on the ground, but as soon as the first message is received (or retransmitted), the recv-window is created. Not too bad, as the first message is usually a JOIN or JOIN-RSP, and only then does message traffic start

h5. Use SYNC altogether (replace first message above)
* This means that the first N messages would always get dropped until a SYNC and subsequent SYNC-OK have been received
** This would be completely new logic, replacing a proven algorithm

h5. Provide connection establishment a la TCP (SYN / SYN-ACK - ACK)
* Complex state model (including timeouts and dreaded states such as TIME-WAIT)
* Redundant if run over TCP as transport
* Blocking semantics: threads need to block until connection has been established



was (Author: belaban):
Hmm, there's a problem when a new connection is established:
* A (conn-id=1) sends A1(first)-A10 to B
* B receives A:3 first. Since this isn't tagged as {{first}}, B sends a SYNC to A
* A hasn't seen that SYNC yet, so it changes its {{conn-id}} to 3
* B now receives A:1(first) and creates the connection with {{conn-id}} == 2
* B also receives the other messages and adds them to the connections recv-window
* Now B receives the {{SYNC-OK}} message with {{conn-id}} == 3
** B now removes the recv-window for A and creates a new one with {{conn-id}} == 3

This race between the first message and SYNC happens because we cannot guarantee that the first message from A is always received _first_ by B.

I'm starting to dislike this whole SYNC business: we're fixing an edge case that happens 0.001% of the time, but introduce a complexity that's not warranted !

So what are the alternatives ?

h5. Do nothing (status quo) / first message creates the recv-window
* With {{conn-close-timeout}} set to a reasonably large value, JGRP-1873 can almost never happen
* The first message creates the recv-window
** If we get some messages before the first, they're dropped on the ground, but as soon as the first message is received (or retransmitted), the recv-window is created. Not too bad, as the first message is usually a JOIN or JOIN-RSP, and only then does message traffic start

h5. Use SYNC altogether (replace first message above)
* This means that the first N messages would always get dropped until a SYNC and subsequent SYNC-OK have been received
** This would be completely new logic, replacing a proven algorithm

h5. Provide connection establishment a la TCP (SYN / SYNC-ACK - ACK)
* Complex state model (including timeouts and dreaded states such as TIME-WAIT)
* Redundant if run over TCP as transport
* Blocking semantics: threads need to block until connection has been established


> UNICAST3/UNICAST2: sync receiver table with sender table
> --------------------------------------------------------
>
>                 Key: JGRP-1875
>                 URL: https://issues.jboss.org/browse/JGRP-1875
>             Project: JGroups
>          Issue Type: Enhancement
>            Reporter: Bela Ban
>            Assignee: Bela Ban
>             Fix For: 3.5.1, 3.6
>
>
> If a receiver B closes its recv-table and the sender A doesn't, then (when receiving msgs from the sender) the receiver engages in a protocol using {{GET-FIRST-SEQNO}} to sync itself with the sender. This has several problems, detailed in JGRP-1873 and JGRP-1874. (Note that the other way round (sender closing send-table), there is no issue, as the sender will create a new connection with a new {{conn-id}}).
> To prevent {{STABLE}} messages interfering with {{GET-FIRST-SEQNO}} messages (JGRP-1873), we could run an additional {{SYNC}} protocol round, e.g.
> * B needs to get the lowest and highest seqno sent from A
> * B sends a {{SYNC}} message to A (instead of a {{GET-FIRST-SEQNO}} message)
> * B sets a flag that discards all {{STABLE}} or {{ACK}} messages on reception of {{SYNC}}
> * B replies with a {{SYNC-OK}} containing the _lowest_ and _highest_ sent seqnos
> * B creates an entry for A with the lowest and highest seqnos
> * B sends a {{SYNC-ACK}} to A
> * A resets the flag and starts accepting {{STABLE}} / {{ACK}} messages from B again
> * A and B now use the usual protocols to retransmit missing messages



--
This message was sent by Atlassian JIRA
(v6.3.1#6329)


More information about the jboss-jira mailing list