New subject: [JBoss JIRA] Assigned: (JGRP-346) Connection objects are removed from the ConnectionTable, but remain active on the system and eventually consume available system resources.

Tuesday, 24 October 2006

Connection objects are removed from the ConnectionTable, but remain active on the system
and eventually consume available system resources.
-------------------------------------------------------------------------------------------------------------------------------------------

                 Key: JGRP-346
                 URL: http://jira.jboss.com/jira/browse/JGRP-346
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 2.3 SP1
         Environment: SUSE Linux 9
            Reporter: Stuart Jensen
         Assigned To: Bela Ban

To duplicate the issue:
1) Create a four member cluster using the following configuration: (two member clusters
exhibit the problem as well, just not as exaggerated)

TCP(start_port=7801):
TCPPING(initial_hosts=<ip addresses go
here>;port_range=3;timeout=3500;num_initial_members=3;up_thread=true;down_thread=true):
MERGE2(min_interval=5000;max_interval=10000):
FD(shun=true;timeout=2500;max_tries=5;up_thread=true;down_thread=true):
VERIFY_SUSPECT(timeout=2000;down_thread=false;up_thread=false):
pbcast.NAKACK(down_thread=true;up_thread=true;gc_lag=100;retransmit_timeout=3000):
pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):
pbcast.GMS(join_timeout=5000;join_retry_timeout=3500;shun=true;print_local_addr=true;down_thread=true;up_thread=true)

2)  I was running JGroups in a Tomcat servlet application. Start up the cluster. To
determine the number of threads on Linux I executed the following commands:

ps -ef | grep tomcat
echo "" > catalina.out
kill -QUIT <pid from ps command above>
grep ".Sender \[" catalina.out | wc -l

You get the process id of Tomcat using the ps command.  Then clear the content of the
catalina.out file. The kill command causes the threads to be printed into the catalina.out
file.  Then the grep searches for and counts all of the
"ConnectionTable.Connnection.Sender" threads that are currently active on the
system.

3) Pick one of the cluster member boxes and pull the network cable out of the box such
that all communication with the other three members is terminated.

4) After one or two minutes, replace the network cable.

5) Repeat the steps to determine the number of threads currently active on the system.

6) Repeat steps 3 through 5, each time watching the number of threads.  Each iteration
will cause more and more threads to be orphaned on the system.  It seems to grow
exponentially, after about 4 iterations we have around 300-400 Sender threads.  The
Receiver threads will be orphaned also in similar numbers.

After investigating the issue, I came up with the following "fix" which cleared
the problem up.

In the file ConnectionTable.java there is a method called retainAll().  It appears that
this method is called by the TCP protocol when a view change occurs.  This method removes
Connnections from the "Connection Pool" (member variable conns) but does not
destroy them.  We initially thought the reaper thread may clean them up, but since the
Connection objects are actually removed from the Connection Pool, the reaper does not help
the situation.  As we watched our connections we noticed that the Connections orphaned by
this routine were the ones filling up the system's set of threads.  So, we added code
to call destroy() on all of the Connection objects that retainAll() removes from the
Connection Pool.  The "diff" is provided below. Note that we did our change in
the JGroups 2.3 SP1 file ConnectionTable.java.  Scott Marlow did this diff for me, the
same change, but applied to the BasicConnectionTable from the 2.4 source set.

Index: BasicConnectionTable.java
===================================================================
RCS file: /cvsroot/javagroups/JGroups/src/org/jgroups/blocks/BasicConnectionTable.java,v
retrieving revision 1.8
diff -r1.8 BasicConnectionTable.java
22,26c22
< import java.util.Map;
< import java.util.Iterator;
< import java.util.HashMap;
< import java.util.Vector;
< import java.util.Collection;
---
...
 import java.util.*; 263c259,289
<        conns.keySet().retainAll(c);
---
...
 //       conns.keySet().retainAll(c);
        ArrayList alConnsToDestroy = new ArrayList();
        synchronized(conns)
        {
            HashMap copy=new HashMap(conns);
            conns.keySet().retainAll(c);
            Set ks = copy.keySet();
            Iterator iter = ks.iterator();
            while (iter.hasNext())
            {
                Object oKey = iter.next();
                if (null == conns.get(oKey))
                {	// This connection NOT in the resultant connection set
                    Connection conn = (Connection)copy.get(oKey);
                    if (null != conn)
                    {	// Destroy this connection
                        alConnsToDestroy.add(conn);
                    }
                }
            }
        }
        // All of the connections that were not retained must be destroyed
        // so that their resources are cleaned up.
        for (int a=0; a<alConnsToDestroy.size(); a++)
        {
            Connection conn = (Connection)alConnsToDestroy.get(a);
            if(log.isTraceEnabled())
                log.trace("Destroy this orphaned connection: " + conn);
            conn.destroy();
        }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBoss JIRA] Created: (JGRP-346) Connection objects are removed from the ConnectionTable, but remain active on the system and eventually consume available system resources.