[jboss-jira] [JBoss JIRA] (WFLY-11513) Artemis throws Critical IO Error if new journal file is not created in 5 seconds

Kabir Khan (Jira) issues at jboss.org
Thu Dec 13 05:15:00 EST 2018


Kabir Khan created WFLY-11513:
---------------------------------

             Summary: Artemis throws Critical IO Error if new journal file is not created in 5 seconds
                 Key: WFLY-11513
                 URL: https://issues.jboss.org/browse/WFLY-11513
             Project: WildFly
          Issue Type: Enhancement
    Affects Versions: 15.0.0.Beta1
            Reporter: Tomas Hofman
            Assignee: Tomas Hofman


I can see in our CI jobs that Artemis sometimes stops because of error \[1\]. I looked at the code \[2\] where the exception is thrown and I think it could be improved a bit.

_Customer Impact:_ If Artemis journal is located on slower file system (like NFS) then if server is under load then it might crash. This will lead to unavailability of service. Server must be restarted to recover. 

First thing I noticed is that the 5 seconds timeout is not configurable. I agree that it should be enough in most cases but if someone would want to use NFS for Artemis journal and he doesn't care about performance, we should able him to tune this value. Additionally the timeout doesn't reflect size of journal files.

Second thing is that when {{openedFiles.poll()}} returns null we can't be sure whether it is problem of exhausted disc or exhausted CPU. I think there should be added some kind of latch which would wait until pushOpenRunnable is executed. It will make sure that there is issue with IO operations and it was not caused by exhausted CPU.

\[1\]
{code}
09:45:07,418 WARN  [org.apache.activemq.artemis.core.server] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4 at 2646099c-962838060)) AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=unable to open : ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149003: File not opened]
        at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.openFile(JournalFilesRepository.java:423) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
        at org.apache.activemq.artemis.core.journal.impl.JournalImpl.moveNextFile(JournalImpl.java:2885) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
        at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:2842) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
        at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2568) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
        at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$200(JournalImpl.java:87) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
        at org.apache.activemq.artemis.core.journal.impl.JournalImpl$4.run(JournalImpl.java:889) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
        at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
{code}

\[2\]
{code}
public JournalFile openFile() throws InterruptedException, ActiveMQIOErrorException {
      if (logger.isTraceEnabled()) {
         logger.trace("enqueueOpenFile with openedFiles.size=" + openedFiles.size());
      }

      if (openFilesExecutor == null) {
         pushOpenRunnable.run();
      } else {
         openFilesExecutor.execute(pushOpenRunnable);
      }

      JournalFile nextFile = openedFiles.poll(5, TimeUnit.SECONDS);
      if (nextFile == null) {
         fileFactory.onIOError(ActiveMQJournalBundle.BUNDLE.fileNotOpened(), "unable to open ", null);
         // We need to reconnect the current file with the timed buffer as we were not able to roll the file forward
         // If you don't do this you will get a NPE in TimedBuffer::checkSize where it uses the bufferobserver
         fileFactory.activateBuffer(journal.getCurrentFile().getFile());
         throw ActiveMQJournalBundle.BUNDLE.fileNotOpened();
      }

      if (logger.isTraceEnabled()) {
         logger.trace("Returning file " + nextFile);
      }

      return nextFile;
   }
{code}



--
This message was sent by Atlassian Jira
(v7.12.1#712002)


More information about the jboss-jira mailing list