[
https://issues.jboss.org/browse/JGRP-1875?page=com.atlassian.jira.plugin....
]
Bela Ban commented on JGRP-1875:
--------------------------------
Hmm, there's a problem when a new connection is established:
* A (conn-id=1) sends A1(first)-A10 to B
* B receives A:3 first. Since this isn't tagged as {{first}}, B sends a SYNC to A
* A hasn't seen that SYNC yet, so it changes its {{conn-id}} to 3
* B now receives A:1(first) and creates the connection with {{conn-id}} == 2
* B also receives the other messages and adds them to the connections recv-window
* Now B receives the {{SYNC-OK}} message with {{conn-id}} == 3
** B now removes the recv-window for A and creates a new one with {{conn-id}} == 3
This race between the first message and SYNC happens because we cannot guarantee that the
first message from A is always received _first_ by B.
I'm starting to dislike this whole SYNC business: we're fixing an edge case that
happens 0.001% of the time, but introduce a complexity that's not warranted !
So what are the alternatives ?
h5. Do nothing (status quo) / first message creates the recv-window
* With {{conn-close-timeout}} set to a reasonably large value, JGRP-1873 can almost never
happen
* The first message creates the recv-window
** If we get some messages before the first, they're dropped on the ground, but as
soon as the first message is received (or retransmitted), the recv-window is created. Not
too bad, as the first message is usually a JOIN or JOIN-RSP, and only then does message
traffic start
h5. Use SYNC altogether (replace first message above)
* This means that the first N messages would always get dropped until a SYNC and
subsequent SYNC-OK have been received
** This would be completely new logic, replacing a proven algorithm
h5. Provide connection establishment a la TCP (SYN / SYNC-ACK - ACK)
* Complex state model (including timeouts and dreaded states such as TIME-WAIT)
* Redundant if run over TCP as transport
* Blocking semantics: threads need to block until connection has been established
UNICAST3/UNICAST2: sync receiver table with sender table
--------------------------------------------------------
Key: JGRP-1875
URL:
https://issues.jboss.org/browse/JGRP-1875
Project: JGroups
Issue Type: Enhancement
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 3.5.1, 3.6
If a receiver B closes its recv-table and the sender A doesn't, then (when receiving
msgs from the sender) the receiver engages in a protocol using {{GET-FIRST-SEQNO}} to sync
itself with the sender. This has several problems, detailed in JGRP-1873 and JGRP-1874.
(Note that the other way round (sender closing send-table), there is no issue, as the
sender will create a new connection with a new {{conn-id}}).
To prevent {{STABLE}} messages interfering with {{GET-FIRST-SEQNO}} messages (JGRP-1873),
we could run an additional {{SYNC}} protocol round, e.g.
* B needs to get the lowest and highest seqno sent from A
* B sends a {{SYNC}} message to A (instead of a {{GET-FIRST-SEQNO}} message)
* B sets a flag that discards all {{STABLE}} or {{ACK}} messages on reception of
{{SYNC}}
* B replies with a {{SYNC-OK}} containing the _lowest_ and _highest_ sent seqnos
* B creates an entry for A with the lowest and highest seqnos
* B sends a {{SYNC-ACK}} to A
* A resets the flag and starts accepting {{STABLE}} / {{ACK}} messages from B again
* A and B now use the usual protocols to retransmit missing messages
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)