]
Vladimir Blagojevic resolved JGRP-1164.
---------------------------------------
Fix Version/s: 2.8.1
Resolution: Done
Resolved on 2.8, 2.9 and HEAD branches. Will be included in first maintenance release in
2.8, 2.9 and 2.10 Alpha3
TCPGOSSIP doesn't maintain the Gossip Router state correctly
------------------------------------------------------------
Key: JGRP-1164
URL:
https://jira.jboss.org/jira/browse/JGRP-1164
Project: JGroups
Issue Type: Bug
Affects Versions: 2.8, 2.9
Reporter: vivek v
Assignee: Vladimir Blagojevic
Fix For: 2.8.1, 2.10
This came out of discussion /w Vladimir on jira# 1162
(
https://jira.jboss.org/jira/browse/JGRP-1162). Basically, the problem is currently
TCPGossip.connect(..) calls RouterStubs.connect(..). The connect at router stub simply
makes the socket connection /w Gossip Router and then sends the "Connect"
command without waiting for the response from GR. The socket creation creates the
Connection Handler on the Gossip Router. The problem is in a very lossy network we might
lose the "Connect" command and thus GR will not add the node (RouterStub node)
in its list. Since, the socket connection is established (and won't disconnect due
keep alive even in lossy network), the router stub would assume it is connected (would set
to CONNECTED if the socket is good), but will not get itself from GR when asking for nodes
in the group.
Theoretically in TCP connection we shouldn't lose the packets, but in our experiments
we did. I tried the test with WANem setting 50% packet loss and was able to create the
situation where the socket was created, but no CONNECT. This caused Gossip Router to
publish wrong node list.
Two possibles fixes,
1) When we make the socket connection assume CONNECTED - so we don't have to send a
separate CONNECT command.
- This is not a good design as the STATE should be kept by the application
layer
2) Wait for the response from Gossip Router for CONNECT (same for DISCONNECT as well)
before setting TCPGOSSIP to CONNECTED state
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: