[jboss-jira] [JBoss JIRA] Updated: (JGRP-690) Thread interrupt status is not always cleared by default

Mon Feb 11 18:37:05 EST 2008

     [ http://jira.jboss.com/jira/browse/JGRP-690?page=all ]

Vladimir Blagojevic updated JGRP-690:
-------------------------------------

    Description: 
If Thread#interrupt() is invoked on a thread that is blocked in an invocation of the wait(), wait(long), or wait(long, int) methods of the Object  class, or of the join(), join(long), join(long, int), sleep(long), or sleep(long, int), methods of this class, then its interrupt status will be cleared and it will receive an InterruptedException. (citing Thread#interrupt javadoc)

Conventional wisdom is that interrupt signal was always cleared. However, in cases except the above interrupt signal is actually not cleared! JGroups code that is throwing InterruptedException should make sure that it also clears the interrupt status. Otherwise, if we wrongly assume that interrupt status is always cleared, we can have inconsistencies that can lead to hard-to-trace bugs (See below). We should verify that throughout our code that we always clear interrupt status.

Gray said on javagroups-list:
We have been seeing a large percentage of our production servers (on 2.6.1)
spinning above 100% of CPU.  After some investigation, we seem to have found
an infinite loop in Scheduler.java.  If sched_thread gets interrupted, its
interrupt flag never gets reset so it spins forever suspending and resuming
the current_thread and throwing and catching InterruptedException(s). 
According to jconsole, one of our boxes has done this 690 million times --
nice effort.  When this is happening, the memory space is alternating
between 200 and 300 mb which makes sense given the exception bandwidth.

We are running our RPC dispatchers wth deadlock_detection and
concurrent_processing set to true and use_concurrent_stack="true".

Here are  http://www.nabble.com/file/p15395455/stack_dumps.txt some stack
snapshots of the spinning thread .  As you can see they are all in
Scheduler.run.  Here is the section from 
http://www.nabble.com/file/p15395455/Scheduler.java.txt Scheduler.java with
the offending part of the code  including where sched_thread is being
interrupted by a call to Scheduler.addPrio().  The pluses show  where we
think a missing Thread.interrupt() call should be.

Here's the  http://www.nabble.com/file/p15395455/Scheduler.java.patch.txt
patch for Scheduler.java  (from 2.6.1) which adds in the interrupt call.

This is all with Jgroups 2.6.1.  Anyone else see this behavior?  We're going
to roll this patch out to see if it stops the spinning.

gray

  was:
If Thread#interrupt() is invoked on a thread that is blocked in an invocation of the wait(), wait(long), or wait(long, int) methods of the Object  class, or of the join(), join(long), join(long, int), sleep(long), or sleep(long, int), methods of this class, then its interrupt status will be cleared and it will receive an InterruptedException. (citing Thread#interrupt javadoc)

Conventional wisdom was that interrupt signal was always cleared. However, in cases except the above interrupt signal is actually not cleared! JGroups code that is throwing InterruptedException should make sure that it also clears the interrupt status. Otherwise, if we wrongly assume that interrupt status is always cleared, we can have inconsistencies that can lead to hard-to-trace bugs (See below). We should verify that throughout our code that we always clear interrupt status.

Gray said on javagroups-list:
We have been seeing a large percentage of our production servers (on 2.6.1)
spinning above 100% of CPU.  After some investigation, we seem to have found
an infinite loop in Scheduler.java.  If sched_thread gets interrupted, its
interrupt flag never gets reset so it spins forever suspending and resuming
the current_thread and throwing and catching InterruptedException(s). 
According to jconsole, one of our boxes has done this 690 million times --
nice effort.  When this is happening, the memory space is alternating
between 200 and 300 mb which makes sense given the exception bandwidth.

We are running our RPC dispatchers wth deadlock_detection and
concurrent_processing set to true and use_concurrent_stack="true".

Here are  http://www.nabble.com/file/p15395455/stack_dumps.txt some stack
snapshots of the spinning thread .  As you can see they are all in
Scheduler.run.  Here is the section from 
http://www.nabble.com/file/p15395455/Scheduler.java.txt Scheduler.java with
the offending part of the code  including where sched_thread is being
interrupted by a call to Scheduler.addPrio().  The pluses show  where we
think a missing Thread.interrupt() call should be.

Here's the  http://www.nabble.com/file/p15395455/Scheduler.java.patch.txt
patch for Scheduler.java  (from 2.6.1) which adds in the interrupt call.

This is all with Jgroups 2.6.1.  Anyone else see this behavior?  We're going
to roll this patch out to see if it stops the spinning.

gray

> Thread interrupt status is not always cleared by default
> --------------------------------------------------------
>
>                 Key: JGRP-690
>                 URL: http://jira.jboss.com/jira/browse/JGRP-690
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.6, 2.7
>            Reporter: Vladimir Blagojevic
>         Assigned To: Vladimir Blagojevic
>             Fix For: 2.7, 2.6.2
>
>
> If Thread#interrupt() is invoked on a thread that is blocked in an invocation of the wait(), wait(long), or wait(long, int) methods of the Object  class, or of the join(), join(long), join(long, int), sleep(long), or sleep(long, int), methods of this class, then its interrupt status will be cleared and it will receive an InterruptedException. (citing Thread#interrupt javadoc)
> Conventional wisdom is that interrupt signal was always cleared. However, in cases except the above interrupt signal is actually not cleared! JGroups code that is throwing InterruptedException should make sure that it also clears the interrupt status. Otherwise, if we wrongly assume that interrupt status is always cleared, we can have inconsistencies that can lead to hard-to-trace bugs (See below). We should verify that throughout our code that we always clear interrupt status.
> Gray said on javagroups-list:
> We have been seeing a large percentage of our production servers (on 2.6.1)
> spinning above 100% of CPU.  After some investigation, we seem to have found
> an infinite loop in Scheduler.java.  If sched_thread gets interrupted, its
> interrupt flag never gets reset so it spins forever suspending and resuming
> the current_thread and throwing and catching InterruptedException(s). 
> According to jconsole, one of our boxes has done this 690 million times --
> nice effort.  When this is happening, the memory space is alternating
> between 200 and 300 mb which makes sense given the exception bandwidth.
> We are running our RPC dispatchers wth deadlock_detection and
> concurrent_processing set to true and use_concurrent_stack="true".
> Here are  http://www.nabble.com/file/p15395455/stack_dumps.txt some stack
> snapshots of the spinning thread .  As you can see they are all in
> Scheduler.run.  Here is the section from 
> http://www.nabble.com/file/p15395455/Scheduler.java.txt Scheduler.java with
> the offending part of the code  including where sched_thread is being
> interrupted by a call to Scheduler.addPrio().  The pluses show  where we
> think a missing Thread.interrupt() call should be.
> Here's the  http://www.nabble.com/file/p15395455/Scheduler.java.patch.txt
> patch for Scheduler.java  (from 2.6.1) which adds in the interrupt call.
> This is all with Jgroups 2.6.1.  Anyone else see this behavior?  We're going
> to roll this patch out to see if it stops the spinning.
> gray

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira