JBoss [Zion] 4.0.3SP1 (build: CVSTag=JBoss_4_0_3_SP1 date=200510231054)
ClockDaemon is used by org.jboss.mg.Connection to periodically run Connection$PingTask.
ClockDaemon holds a Heap that contains references to PingTask instances. Normally, the
PingTask instances are removed from the ClockDaemon's Heap by ClockDaemon$RunLoop,
which contains an infinite loop that runs each PingTask.
Ocassionally, after running my web application for 20-30 hours, I start to see a memory
leak that is caused by references to objects in the ClockDaemon's Heap.
My application has a scheduler that issues a JMS event every 30 seconds, and each one
instantiates a new Connection. The constructor for Connection calls startPingThread(),
which runs
| clockDaemon.executePeriodically(pingPeriod, new PingTask(), true);
This method call inserts a new ClockDaemon$TaskNode on the ClockDaemon's Heap.
When I put my debugger on the leaking application, I see that ClockDaemon$RunLoop is hung
on the following line:
task.command.run();
The task is a TaskNode instance with a command that is a PingTask instance. As a result,
the loop is stopped, so none of the objects can be removed from the ClockDaemon's
Heap.
Drilling down into the PingTask.run() method, I see that it is hung at the following
line:
pingTaskSemaphore.acquire();
My debugger shows that pingTaskSemaphore has zero permits, so the acquire() method will
block until another thread calls pingTaskSemaphore.release(). But apparently release() is
never called.
It seems to me that this should never be allowed to happen, because a low priority ping
task is effectively hijacking the ClockDaemon's loop that dereferences objects. Once
the application reaches this state, the JVM ultimately fails with an OutOfMemory error.
I don't immediately see why pingTaskSemaphore.release() is never called, except that
pingTaskSemaphore.acquire() is not in the same try-finally block. The PingTask.run()
method is below:
| /**
| * The ping task
| */
| class PingTask implements Runnable
| {
| /**
| * Main processing method for the PingTask object
| */
| public void run()
| {
| try
| {
| pingTaskSemaphore.acquire();
| }
| catch (InterruptedException e)
| {
| log.debug("Interrupted requesting ping semaphore");
| return;
| }
| try
| {
| if (ponged == false)
| {
| // Server did not pong use with in the timeout
| // period.. Assuming the connection is dead.
| throw new SpyJMSException("No pong received", new
IOException("ping timeout."));
| }
|
| ponged = false;
| pingServer(System.currentTimeMillis());
| }
| catch (Throwable t)
| {
| asynchFailure("Unexpected ping failure", t);
| }
| finally
| {
| pingTaskSemaphore.release();
| }
| }
| }
|
Notice that pingTaskSemaphore.release() is in a finally block, but it isn't the same
try block as the pingTaskSemaphore.acquire() method call, which means that it is possible
for pingTaskSemaphore.acquire() to decrement the Semaphore's permits and then throw an
exception that is not InterruptedException. In that case, the finally block would never
be executed because the Exception would be thrown from the first try block. However, this
last paragraph is speculation. I don't really know what is happening. I also
don't know why it takes 20-30 hours of operation for this problem to appear. I
don't have a reproducible test case because I can't consistently reproduce this
problem.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3992141#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...