[jboss-jira] [JBoss JIRA] (JGRP-1875) UNICAST3/UNICAST2: sync receiver table with sender table
Bela Ban (JIRA)
issues at jboss.org
Fri Sep 5 09:52:00 EDT 2014
[ https://issues.jboss.org/browse/JGRP-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999397#comment-12999397 ]
Bela Ban commented on JGRP-1875:
--------------------------------
Hmm, there's a problem when a new connection is established:
* A (conn-id=1) sends A1(first)-A10 to B
* B receives A:3 first. Since this isn't tagged as {{first}}, B sends a SYNC to A
* A hasn't seen that SYNC yet, so it changes its {{conn-id}} to 3
* B now receives A:1(first) and creates the connection with {{conn-id}} == 2
* B also receives the other messages and adds them to the connections recv-window
* Now B receives the {{SYNC-OK}} message with {{conn-id}} == 3
** B now removes the recv-window for A and creates a new one with {{conn-id}} == 3
This race between the first message and SYNC happens because we cannot guarantee that the first message from A is always received _first_ by B.
I'm starting to dislike this whole SYNC business: we're fixing an edge case that happens 0.001% of the time, but introduce a complexity that's not warranted !
So what are the alternatives ?
h5. Do nothing (status quo) / first message creates the recv-window
* With {{conn-close-timeout}} set to a reasonably large value, JGRP-1873 can almost never happen
* The first message creates the recv-window
** If we get some messages before the first, they're dropped on the ground, but as soon as the first message is received (or retransmitted), the recv-window is created. Not too bad, as the first message is usually a JOIN or JOIN-RSP, and only then does message traffic start
h5. Use SYNC altogether (replace first message above)
* This means that the first N messages would always get dropped until a SYNC and subsequent SYNC-OK have been received
** This would be completely new logic, replacing a proven algorithm
h5. Provide connection establishment a la TCP (SYN / SYNC-ACK - ACK)
* Complex state model (including timeouts and dreaded states such as TIME-WAIT)
* Redundant if run over TCP as transport
* Blocking semantics: threads need to block until connection has been established
> UNICAST3/UNICAST2: sync receiver table with sender table
> --------------------------------------------------------
>
> Key: JGRP-1875
> URL: https://issues.jboss.org/browse/JGRP-1875
> Project: JGroups
> Issue Type: Enhancement
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 3.5.1, 3.6
>
>
> If a receiver B closes its recv-table and the sender A doesn't, then (when receiving msgs from the sender) the receiver engages in a protocol using {{GET-FIRST-SEQNO}} to sync itself with the sender. This has several problems, detailed in JGRP-1873 and JGRP-1874. (Note that the other way round (sender closing send-table), there is no issue, as the sender will create a new connection with a new {{conn-id}}).
> To prevent {{STABLE}} messages interfering with {{GET-FIRST-SEQNO}} messages (JGRP-1873), we could run an additional {{SYNC}} protocol round, e.g.
> * B needs to get the lowest and highest seqno sent from A
> * B sends a {{SYNC}} message to A (instead of a {{GET-FIRST-SEQNO}} message)
> * B sets a flag that discards all {{STABLE}} or {{ACK}} messages on reception of {{SYNC}}
> * B replies with a {{SYNC-OK}} containing the _lowest_ and _highest_ sent seqnos
> * B creates an entry for A with the lowest and highest seqnos
> * B sends a {{SYNC-ACK}} to A
> * A resets the flag and starts accepting {{STABLE}} / {{ACK}} messages from B again
> * A and B now use the usual protocols to retransmit missing messages
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
More information about the jboss-jira
mailing list