[jboss-jira] [JBoss JIRA] Commented: (JGRP-244) long connect time in application that frequently closes and opens new channel
Bela Ban (JIRA)
jira-events at jboss.com
Fri Aug 11 04:02:12 EDT 2006
[ http://jira.jboss.com/jira/browse/JGRP-244?page=comments#action_12341055 ]
Bela Ban commented on JGRP-244:
-------------------------------
Okay, I applied the patch anyway because it might affect UDP (small chance though because with UDP, we don't allocate the same Addresses, although this can be done using bind_port).
One change though: once we drop a message because it is from a previous member and its seqno is not 1, then we *must not* ack it ! Otherwise we run into http://jira.jboss.com/jira/browse/JGRP-126, where a discarded message is removed from the sender's table and never retransmitted, so the receiver blocks forever !
> long connect time in application that frequently closes and opens new channel
> -----------------------------------------------------------------------------
>
> Key: JGRP-244
> URL: http://jira.jboss.com/jira/browse/JGRP-244
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.2.8, 2.2.9, 2.3, 2.2.9.1, 2.2.9.2
> Environment: SLES 9
> Reporter: Bruce Schuchardt
> Assigned To: Bela Ban
> Fix For: 2.4
>
>
> I have an application that has one long-lived jgroups member and four other processes that often close their channel and create a new one. Most of the time these new connections are formed in a dozen milliseconds or so, but sometimes they're taking over 20 seconds. The apps are using TCPGOSSIP with multicast turned off.
> I turned on tracing and saw that the coordinator's UNICAST was having some trouble. It got a message from a departed member that it stored up and dispatched later when the departed member's address was reused by a new channel.
> a) A member left the view and UNICAST removed its connection for the member and added it to previous_members.
> b) Another message then arrived from the member, and UNICAST created a new connection for it. The message had seqno 4, and was put in the AckReceiverWindow and not passed up.
> c) A few seconds later, a process created a new channel and it got the same ID as the one the coordinator's UNICAST just dealt with.
> d) The new channel sent three UNICAST messages to the coordinator. On the third message, the coordinator's UNICAST removed #3 and the old #4 and passed them both up.
> e) The new channel sent message #4, a JOIN_REQ, and UNICAST discarded it
> The new channel eventually goes through discovery again and gets into the group, but it adds quite a bit to channel startup time, and I'm a little worried that there might be a case where a much higher seqno gets trapped in the receiver window like this.
> I fixed this for my needs by changing UNICAST.handleDataReceived to reject a previous_member message if the seqno is higher than the default initial seqno, but I suppose that might wreak havoc with some other algorithms.
> private void handleDataReceived(Object sender, long seqno, Message msg) {
> if(trace)
> log.trace(new StringBuffer().append(local_addr).append(" <-- DATA(").append(sender).append(": #").append(seqno));
> if(previous_members.contains(sender)) {
> // we don't want to see messages from departed members
> if (seqno > DEFAULT_FIRST_SEQNO) {
> if (trace)
> log.trace("discarding message " + seqno + " from previous member " + sender);
> return;
> }
> if(trace)
> log.trace("removed " + sender + " from previous_members as we received a message from it");
> previous_members.removeElement(sender);
> }
> Entry entry;
> synchronized(connections) {
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list