[jboss-jira] [JBoss JIRA] (WFLY-3620) Deadlock while logging

Tue Jul 15 03:08:29 EDT 2014

    [ https://issues.jboss.org/browse/WFLY-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984983#comment-12984983 ] 

Stefan Schueffler commented on WFLY-3620:
-----------------------------------------

Hmm... you are right - the StdioContext should not lead to any locking. Nevertheless, after some time all available threads are "waiting to lock" as in the stacktrace above.
In the example given in this bug-report, our jsp-page "selectStatus.jsp" is doing a System.out.println, which unfortunately was left in the code... The same behavior also gets triggered using org.slf4j.

A different location where the "waiting to lock" appears (and results in all threads waiting either in StdioContext as above, or Logmanager as given here) is as follows:
{code}
"EJB default - 4" prio=10 tid=0x4a1ce400 nid=0x5f5d waiting for monitor entry [0x53c8f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.jboss.logmanager.handlers.WriterHandler.doPublish(WriterHandler.java:59)
        - waiting to lock <0x5edf6ca0> (a java.lang.Object)
        at org.jboss.logmanager.ExtHandler.publish(ExtHandler.java:79)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:296)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:304)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:304)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:304)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:304)
        at org.jboss.logmanager.Logger.logRaw(Logger.java:721)
        at org.jboss.logmanager.Logger.log(Logger.java:672)
        at org.jboss.logging.JBossLogManagerLogger.doLogf(JBossLogManagerLogger.java:50)
        at org.jboss.logging.Logger.logf(Logger.java:2096)
        at org.jboss.as.ejb3.EjbLogger_$logger.retryingTimeout(EjbLogger_$logger.java:298)
        at org.jboss.as.ejb3.timerservice.task.TimerTask.retryTimeout(TimerTask.java:215)
        at org.jboss.as.ejb3.timerservice.task.TimerTask.run(TimerTask.java:176)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
        at org.jboss.threads.JBossThread.run(JBossThread.java:122)
{code}

In this example, 49 threads are waiting for this particular mutex/lock object (0x5edf6ca0), and one thread is holding this lock, but apparently never finishes his work and releases the lock. Therefore again, only a "kill -9" and restart recovers.
Stacktrace of the one "running, but never finishing" thread:
{code}
"default task-17" prio=10 tid=0x4cdf6000 nid=0x5eca runnable [0x4a86d000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Throwable.fillInStackTrace(Native Method)
        at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
        - locked <0xbeabc168> (a java.io.IOException)
        at java.lang.Throwable.<init>(Throwable.java:265)
        at java.lang.Exception.<init>(Exception.java:66)
        at java.io.IOException.<init>(IOException.java:58)
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
        - locked <0x5edf6d28> (a java.io.BufferedOutputStream)
        at java.io.PrintStream.write(PrintStream.java:449)
        - locked <0x5edf6d08> (a java.io.PrintStream)
        at org.jboss.logmanager.handlers.UncloseableOutputStream.write(UncloseableOutputStream.java:39)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
        at org.jboss.logmanager.handlers.UninterruptibleOutputStream.write(UninterruptibleOutputStream.java:59)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
        at org.jboss.logmanager.handlers.UninterruptibleOutputStream.write(UninterruptibleOutputStream.java:87)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
        at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
        - locked <0x5edf6cd0> (a java.io.OutputStreamWriter)
        at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
        at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129)
        - locked <0x5edf6cd0> (a java.io.OutputStreamWriter)
        at java.io.BufferedWriter.flush(BufferedWriter.java:253)
        - locked <0x5edf6cd0> (a java.io.OutputStreamWriter)
        at org.jboss.logmanager.handlers.WriterHandler.safeFlush(WriterHandler.java:173)
        at org.jboss.logmanager.handlers.WriterHandler.flush(WriterHandler.java:142)
        - locked <0x5edf6ca0> (a java.lang.Object)
        at org.jboss.logmanager.ExtHandler.doPublish(ExtHandler.java:107)
        at org.jboss.logmanager.handlers.WriterHandler.doPublish(WriterHandler.java:70)
        - locked <0x5edf6ca0> (a java.lang.Object)
        at org.jboss.logmanager.ExtHandler.publish(ExtHandler.java:79)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:296)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:304)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:304)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:304)
        at org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:304)
        at org.jboss.logmanager.Logger.logRaw(Logger.java:721)
        at org.slf4j.impl.Slf4jLogger.log(Slf4jLogger.java:326)
        at org.slf4j.impl.Slf4jLogger.log(Slf4jLogger.java:320)
        at org.slf4j.impl.Slf4jLogger.error(Slf4jLogger.java:311)
{code}

As a side note: those stack traces here and the one in the original bug report are taken from the same run. Searching for the waiting-to-lock-object from the original bug report (Object 0x5ee0adf8) reveals, that the "running, but never finishing" thread also holds this lock (it holds two locks: the one in LogManager (where all other LogManager-Threads are waiting for), and the one in StdioContext/Printwriter (where all "other" Logger-Threads are waiting for).

So it comes down to the question, why this one thread never finishes, and how to avoid that one "never ending logger thread" blows up the whole wildfly process by blocking all other threads...

> Deadlock while logging
> ----------------------
>
>                 Key: WFLY-3620
>                 URL: https://issues.jboss.org/browse/WFLY-3620
>             Project: WildFly
>          Issue Type: Bug
>      Security Level: Public(Everyone can see) 
>          Components: Logging
>    Affects Versions: 8.1.0.Final
>         Environment: CentOS 6.5 64bit, java7u45 64bit (and 32 bit, the same behavior)
>            Reporter: Stefan Schueffler
>            Assignee: James Perkins
>            Priority: Critical
>
> We hit really often a deadlock? in org.jboss.stdio.StdioContext$DelegatingPrintStream.println(StdioContext.java:474)
> Even if the "StdioContext" belongs to Jboss-Logging, the problem occurs in our production wildfly installation twice to 4 times a day - all threads are deadlocking while trying to log.debug, log.error, or (sometimes) System.out.println from our application code, and wildfly does not respond anymore...
> The partial stacktrace always is similar to this one:
> {code}
>  "default task-64" prio=10 tid=0x4c539c00 nid=0x5ef9 waiting for monitor entry [0x495e0000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.io.PrintStream.println(PrintStream.java:806)
>         - waiting to lock <0x5ee0adf8> (a java.io.PrintStream)
>         at org.jboss.stdio.StdioContext$DelegatingPrintStream.println(StdioContext.java:474)
>         at jsp.communications.statuschange.selectStatus_jsp._jspService(selectStatus_jsp.java:413)
>         at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:69)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>         at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:85)
>         at io.undertow.servlet.handlers.FilterHandler.handleRequest(FilterHandler.java:82)
> {code}
> While investigating the StdioContext - class, i really wondered whether the used "locking/checking by using a threadlocal" could have worked in a multi-threaded environment (it should have the very same problems as every "double checking" algorithm without proper synchronization).
> If all threads are hanging in this particular lock, only a full wildfly-restart recovers in our case.
> My preferred solution would be a rework of the used org.jboss.stdio. classes, as the used idioms of ThreadLocals for reentrant-checks are at least highly unusual?

--
This message was sent by Atlassian JIRA
(v6.2.6#6264)