[jboss-jira] [JBoss JIRA] Created: (JGRP-1334) TIMED_WAITING loop prevents message propagation to cluster due to FC blockage

Simone Scarduzio (JIRA) jira-events at lists.jboss.org
Fri Jun 17 09:58:23 EDT 2011


TIMED_WAITING loop prevents message propagation to cluster due to FC blockage
-----------------------------------------------------------------------------

                 Key: JGRP-1334
                 URL: https://issues.jboss.org/browse/JGRP-1334
             Project: JGroups
          Issue Type: Bug
    Affects Versions: 2.9, 2.7
         Environment: Sparc Solaris 10
            Reporter: Simone Scarduzio
            Assignee: Bela Ban


Cannot send messages to cluster because node is waiting to have FC credit which never arrives.
JGroups sender thread stays constantly in TIMED_WAITING loop having not enough credit to send any bytes.

We have been able to reproduce the issue which occurs after some days of activity, but the actual triggering event is unclear, no clear test case available.

Cluster is made of about 25 nodes, we're using Oracle jdk 1.6.0_24. No GC time is observed being bigger than 3.6s around the cluster nodes during problem happening. Still we read a lot of VERIFY_SUSPECT in the logs, mentioning many different node address.

See the stack trace in which we see the blockage:

"QuotaDistributor" daemon prio=3 tid=0x00a9cc00 nid=0x71 waiting on condition [0x63f1f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x7b8392e0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
        at org.jgroups.protocols.FC.handleDownMessage(FC.java:550)
        at org.jgroups.protocols.FC.down(FC.java:424)
        at org.jgroups.protocols.FRAG2.down(FRAG2.java:154)
        at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:216)
        at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:836)
        at org.jgroups.JChannel.down(JChannel.java:1626)
        at org.jgroups.JChannel.send(JChannel.java:721)
        at org.jgroups.JChannel.send(JChannel.java:737)
        at com.firsthop.common.platform.jgroups.JGroupsManager.send(JGroupsManager.java:193)


Please advise or give us some pointers on how to progress the investigation


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list