]
ehsavoie Hugonnet updated WFLY-11404:
-------------------------------------
Labels: EAP-CD17 (was: )
Artemis throws Critical IO Error if new journal file is not created
in 5 seconds
--------------------------------------------------------------------------------
Key: WFLY-11404
URL:
https://issues.jboss.org/browse/WFLY-11404
Project: WildFly
Issue Type: Enhancement
Components: JMS
Affects Versions: 15.0.0.Beta1
Reporter: Tomas Hofman
Assignee: Tomas Hofman
Priority: Critical
Labels: EAP-CD17
I can see in our CI jobs that Artemis sometimes stops because of error \[1\]. I looked at
the code \[2\] where the exception is thrown and I think it could be improved a bit.
_Customer Impact:_ If Artemis journal is located on slower file system (like NFS) then if
server is under load then it might crash. This will lead to unavailability of service.
Server must be restarted to recover.
First thing I noticed is that the 5 seconds timeout is not configurable. I agree that it
should be enough in most cases but if someone would want to use NFS for Artemis journal
and he doesn't care about performance, we should able him to tune this value.
Additionally the timeout doesn't reflect size of journal files.
Second thing is that when {{openedFiles.poll()}} returns null we can't be sure
whether it is problem of exhausted disc or exhausted CPU. I think there should be added
some kind of latch which would wait until pushOpenRunnable is executed. It will make sure
that there is issue with IO operations and it was not caused by exhausted CPU.
\[1\]
{code}
09:45:07,418 WARN [org.apache.activemq.artemis.core.server] (Thread-10
(ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@2646099c-962838060))
AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=unable to open
: ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149003: File not opened]
at
org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.openFile(JournalFilesRepository.java:423)
[artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.moveNextFile(JournalImpl.java:2885)
[artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:2842)
[artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2568)
[artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$200(JournalImpl.java:87)
[artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl$4.run(JournalImpl.java:889)
[artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
at
org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101)
[artemis-commons-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[rt.jar:1.8.0_111]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[rt.jar:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
{code}
\[2\]
{code}
public JournalFile openFile() throws InterruptedException, ActiveMQIOErrorException {
if (logger.isTraceEnabled()) {
logger.trace("enqueueOpenFile with openedFiles.size=" +
openedFiles.size());
}
if (openFilesExecutor == null) {
pushOpenRunnable.run();
} else {
openFilesExecutor.execute(pushOpenRunnable);
}
JournalFile nextFile = openedFiles.poll(5, TimeUnit.SECONDS);
if (nextFile == null) {
fileFactory.onIOError(ActiveMQJournalBundle.BUNDLE.fileNotOpened(), "unable
to open ", null);
// We need to reconnect the current file with the timed buffer as we were not
able to roll the file forward
// If you don't do this you will get a NPE in TimedBuffer::checkSize where
it uses the bufferobserver
fileFactory.activateBuffer(journal.getCurrentFile().getFile());
throw ActiveMQJournalBundle.BUNDLE.fileNotOpened();
}
if (logger.isTraceEnabled()) {
logger.trace("Returning file " + nextFile);
}
return nextFile;
}
{code}