[
http://jira.jboss.com/jira/browse/JGRP-244?page=comments#action_12341055 ]
Bela Ban commented on JGRP-244:
-------------------------------
Okay, I applied the patch anyway because it might affect UDP (small chance though because
with UDP, we don't allocate the same Addresses, although this can be done using
bind_port).
One change though: once we drop a message because it is from a previous member and its
seqno is not 1, then we *must not* ack it ! Otherwise we run into
http://jira.jboss.com/jira/browse/JGRP-126, where a discarded message is removed from the
sender's table and never retransmitted, so the receiver blocks forever !
long connect time in application that frequently closes and opens new
channel
-----------------------------------------------------------------------------
Key: JGRP-244
URL:
http://jira.jboss.com/jira/browse/JGRP-244
Project: JGroups
Issue Type: Bug
Affects Versions: 2.2.8, 2.2.9, 2.3, 2.2.9.1, 2.2.9.2
Environment: SLES 9
Reporter: Bruce Schuchardt
Assigned To: Bela Ban
Fix For: 2.4
I have an application that has one long-lived jgroups member and four other processes
that often close their channel and create a new one. Most of the time these new
connections are formed in a dozen milliseconds or so, but sometimes they're taking
over 20 seconds. The apps are using TCPGOSSIP with multicast turned off.
I turned on tracing and saw that the coordinator's UNICAST was having some trouble.
It got a message from a departed member that it stored up and dispatched later when the
departed member's address was reused by a new channel.
a) A member left the view and UNICAST removed its connection for the member and added it
to previous_members.
b) Another message then arrived from the member, and UNICAST created a new connection
for it. The message had seqno 4, and was put in the AckReceiverWindow and not passed up.
c) A few seconds later, a process created a new channel and it got the same ID as the
one the coordinator's UNICAST just dealt with.
d) The new channel sent three UNICAST messages to the coordinator. On the third message,
the coordinator's UNICAST removed #3 and the old #4 and passed them both up.
e) The new channel sent message #4, a JOIN_REQ, and UNICAST discarded it
The new channel eventually goes through discovery again and gets into the group, but it
adds quite a bit to channel startup time, and I'm a little worried that there might be
a case where a much higher seqno gets trapped in the receiver window like this.
I fixed this for my needs by changing UNICAST.handleDataReceived to reject a
previous_member message if the seqno is higher than the default initial seqno, but I
suppose that might wreak havoc with some other algorithms.
private void handleDataReceived(Object sender, long seqno, Message msg) {
if(trace)
log.trace(new StringBuffer().append(local_addr).append(" <--
DATA(").append(sender).append(": #").append(seqno));
if(previous_members.contains(sender)) {
// we don't want to see messages from departed members
if (seqno > DEFAULT_FIRST_SEQNO) {
if (trace)
log.trace("discarding message " + seqno + " from previous
member " + sender);
return;
}
if(trace)
log.trace("removed " + sender + " from previous_members as
we received a message from it");
previous_members.removeElement(sender);
}
Entry entry;
synchronized(connections) {
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira