[JBoss JIRA] (WFLY-11404) Artemis throws Critical IO Error if new journal file is not created in 5 seconds
by ehsavoie Hugonnet (Jira)
[ https://issues.jboss.org/browse/WFLY-11404?page=com.atlassian.jira.plugin... ]
ehsavoie Hugonnet updated WFLY-11404:
-------------------------------------
Labels: (was: EAP-CD17)
> Artemis throws Critical IO Error if new journal file is not created in 5 seconds
> --------------------------------------------------------------------------------
>
> Key: WFLY-11404
> URL: https://issues.jboss.org/browse/WFLY-11404
> Project: WildFly
> Issue Type: Enhancement
> Components: JMS
> Affects Versions: 15.0.0.Beta1
> Reporter: Tomas Hofman
> Assignee: Tomas Hofman
> Priority: Critical
>
> I can see in our CI jobs that Artemis sometimes stops because of error \[1\]. I looked at the code \[2\] where the exception is thrown and I think it could be improved a bit.
> _Customer Impact:_ If Artemis journal is located on slower file system (like NFS) then if server is under load then it might crash. This will lead to unavailability of service. Server must be restarted to recover.
> First thing I noticed is that the 5 seconds timeout is not configurable. I agree that it should be enough in most cases but if someone would want to use NFS for Artemis journal and he doesn't care about performance, we should able him to tune this value. Additionally the timeout doesn't reflect size of journal files.
> Second thing is that when {{openedFiles.poll()}} returns null we can't be sure whether it is problem of exhausted disc or exhausted CPU. I think there should be added some kind of latch which would wait until pushOpenRunnable is executed. It will make sure that there is issue with IO operations and it was not caused by exhausted CPU.
> \[1\]
> {code}
> 09:45:07,418 WARN [org.apache.activemq.artemis.core.server] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@2646099c-962838060)) AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=unable to open : ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149003: File not opened]
> at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.openFile(JournalFilesRepository.java:423) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.moveNextFile(JournalImpl.java:2885) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:2842) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2568) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$200(JournalImpl.java:87) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl$4.run(JournalImpl.java:889) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111]
> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
> {code}
> \[2\]
> {code}
> public JournalFile openFile() throws InterruptedException, ActiveMQIOErrorException {
> if (logger.isTraceEnabled()) {
> logger.trace("enqueueOpenFile with openedFiles.size=" + openedFiles.size());
> }
> if (openFilesExecutor == null) {
> pushOpenRunnable.run();
> } else {
> openFilesExecutor.execute(pushOpenRunnable);
> }
> JournalFile nextFile = openedFiles.poll(5, TimeUnit.SECONDS);
> if (nextFile == null) {
> fileFactory.onIOError(ActiveMQJournalBundle.BUNDLE.fileNotOpened(), "unable to open ", null);
> // We need to reconnect the current file with the timed buffer as we were not able to roll the file forward
> // If you don't do this you will get a NPE in TimedBuffer::checkSize where it uses the bufferobserver
> fileFactory.activateBuffer(journal.getCurrentFile().getFile());
> throw ActiveMQJournalBundle.BUNDLE.fileNotOpened();
> }
> if (logger.isTraceEnabled()) {
> logger.trace("Returning file " + nextFile);
> }
> return nextFile;
> }
> {code}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 10 months
[JBoss JIRA] (WFLY-11833) Stateful Session Bean affinity URI instead of cluster
by Jörg Bäsner (Jira)
[ https://issues.jboss.org/browse/WFLY-11833?page=com.atlassian.jira.plugin... ]
Jörg Bäsner updated WFLY-11833:
-------------------------------
Attachment: stateful-timeout.zip
> Stateful Session Bean affinity URI instead of cluster
> -----------------------------------------------------
>
> Key: WFLY-11833
> URL: https://issues.jboss.org/browse/WFLY-11833
> Project: WildFly
> Issue Type: Bug
> Components: Clustering, EJB
> Affects Versions: 16.0.0.Final
> Environment: WildFly cluster having SFSB deployed.
> Reporter: Jörg Bäsner
> Assignee: Richard Achmatowicz
> Priority: Major
> Attachments: stateful-timeout.zip
>
>
> Deployed is an application with the following setup:
> * Containing a SFSB (_with passivationCapable="true"_)
> * A SLSB exposing a _remote_ method to a standalone client returning an instance of the SFSB
> Scenario:
> A standalone client is invoking the _remote_ method on the Stateless Session Bean and a new instance of the Stateful Session Bean is returned.
> The issue is that the affinity of the returned Stateful Session Bean is URI instead of Cluster.
> See the attached Gradle reproducer application
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 10 months
[JBoss JIRA] (WFLY-11404) Artemis throws Critical IO Error if new journal file is not created in 5 seconds
by ehsavoie Hugonnet (Jira)
[ https://issues.jboss.org/browse/WFLY-11404?page=com.atlassian.jira.plugin... ]
ehsavoie Hugonnet updated WFLY-11404:
-------------------------------------
Labels: EAP-CD17 (was: )
> Artemis throws Critical IO Error if new journal file is not created in 5 seconds
> --------------------------------------------------------------------------------
>
> Key: WFLY-11404
> URL: https://issues.jboss.org/browse/WFLY-11404
> Project: WildFly
> Issue Type: Enhancement
> Components: JMS
> Affects Versions: 15.0.0.Beta1
> Reporter: Tomas Hofman
> Assignee: Tomas Hofman
> Priority: Critical
> Labels: EAP-CD17
>
> I can see in our CI jobs that Artemis sometimes stops because of error \[1\]. I looked at the code \[2\] where the exception is thrown and I think it could be improved a bit.
> _Customer Impact:_ If Artemis journal is located on slower file system (like NFS) then if server is under load then it might crash. This will lead to unavailability of service. Server must be restarted to recover.
> First thing I noticed is that the 5 seconds timeout is not configurable. I agree that it should be enough in most cases but if someone would want to use NFS for Artemis journal and he doesn't care about performance, we should able him to tune this value. Additionally the timeout doesn't reflect size of journal files.
> Second thing is that when {{openedFiles.poll()}} returns null we can't be sure whether it is problem of exhausted disc or exhausted CPU. I think there should be added some kind of latch which would wait until pushOpenRunnable is executed. It will make sure that there is issue with IO operations and it was not caused by exhausted CPU.
> \[1\]
> {code}
> 09:45:07,418 WARN [org.apache.activemq.artemis.core.server] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@2646099c-962838060)) AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=unable to open : ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149003: File not opened]
> at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.openFile(JournalFilesRepository.java:423) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.moveNextFile(JournalImpl.java:2885) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:2842) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2568) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$200(JournalImpl.java:87) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl$4.run(JournalImpl.java:889) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111]
> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
> {code}
> \[2\]
> {code}
> public JournalFile openFile() throws InterruptedException, ActiveMQIOErrorException {
> if (logger.isTraceEnabled()) {
> logger.trace("enqueueOpenFile with openedFiles.size=" + openedFiles.size());
> }
> if (openFilesExecutor == null) {
> pushOpenRunnable.run();
> } else {
> openFilesExecutor.execute(pushOpenRunnable);
> }
> JournalFile nextFile = openedFiles.poll(5, TimeUnit.SECONDS);
> if (nextFile == null) {
> fileFactory.onIOError(ActiveMQJournalBundle.BUNDLE.fileNotOpened(), "unable to open ", null);
> // We need to reconnect the current file with the timed buffer as we were not able to roll the file forward
> // If you don't do this you will get a NPE in TimedBuffer::checkSize where it uses the bufferobserver
> fileFactory.activateBuffer(journal.getCurrentFile().getFile());
> throw ActiveMQJournalBundle.BUNDLE.fileNotOpened();
> }
> if (logger.isTraceEnabled()) {
> logger.trace("Returning file " + nextFile);
> }
> return nextFile;
> }
> {code}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 10 months
[JBoss JIRA] (WFLY-11833) Stateful Session Bean affinity URI instead of cluster
by Jörg Bäsner (Jira)
Jörg Bäsner created WFLY-11833:
----------------------------------
Summary: Stateful Session Bean affinity URI instead of cluster
Key: WFLY-11833
URL: https://issues.jboss.org/browse/WFLY-11833
Project: WildFly
Issue Type: Bug
Components: Clustering, EJB
Affects Versions: 16.0.0.Final
Environment: WildFly cluster having SFSB deployed.
Reporter: Jörg Bäsner
Assignee: Richard Achmatowicz
Deployed is an application with the following setup:
* Containing a SFSB (_with passivationCapable="true"_)
* A SLSB exposing a _remote_ method to a standalone client returning an instance of the SFSB
Scenario:
A standalone client is invoking the _remote_ method on the Stateless Session Bean and a new instance of the Stateful Session Bean is returned.
The issue is that the affinity of the returned Stateful Session Bean is URI instead of Cluster.
See the attached Gradle reproducer application
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 10 months
[JBoss JIRA] (WFLY-11404) Artemis throws Critical IO Error if new journal file is not created in 5 seconds
by ehsavoie Hugonnet (Jira)
[ https://issues.jboss.org/browse/WFLY-11404?page=com.atlassian.jira.plugin... ]
ehsavoie Hugonnet updated WFLY-11404:
-------------------------------------
Component/s: JMS
> Artemis throws Critical IO Error if new journal file is not created in 5 seconds
> --------------------------------------------------------------------------------
>
> Key: WFLY-11404
> URL: https://issues.jboss.org/browse/WFLY-11404
> Project: WildFly
> Issue Type: Enhancement
> Components: JMS
> Affects Versions: 15.0.0.Beta1
> Reporter: Tomas Hofman
> Assignee: Tomas Hofman
> Priority: Critical
>
> I can see in our CI jobs that Artemis sometimes stops because of error \[1\]. I looked at the code \[2\] where the exception is thrown and I think it could be improved a bit.
> _Customer Impact:_ If Artemis journal is located on slower file system (like NFS) then if server is under load then it might crash. This will lead to unavailability of service. Server must be restarted to recover.
> First thing I noticed is that the 5 seconds timeout is not configurable. I agree that it should be enough in most cases but if someone would want to use NFS for Artemis journal and he doesn't care about performance, we should able him to tune this value. Additionally the timeout doesn't reflect size of journal files.
> Second thing is that when {{openedFiles.poll()}} returns null we can't be sure whether it is problem of exhausted disc or exhausted CPU. I think there should be added some kind of latch which would wait until pushOpenRunnable is executed. It will make sure that there is issue with IO operations and it was not caused by exhausted CPU.
> \[1\]
> {code}
> 09:45:07,418 WARN [org.apache.activemq.artemis.core.server] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@2646099c-962838060)) AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=unable to open : ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149003: File not opened]
> at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.openFile(JournalFilesRepository.java:423) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.moveNextFile(JournalImpl.java:2885) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:2842) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2568) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$200(JournalImpl.java:87) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.core.journal.impl.JournalImpl$4.run(JournalImpl.java:889) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111]
> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
> {code}
> \[2\]
> {code}
> public JournalFile openFile() throws InterruptedException, ActiveMQIOErrorException {
> if (logger.isTraceEnabled()) {
> logger.trace("enqueueOpenFile with openedFiles.size=" + openedFiles.size());
> }
> if (openFilesExecutor == null) {
> pushOpenRunnable.run();
> } else {
> openFilesExecutor.execute(pushOpenRunnable);
> }
> JournalFile nextFile = openedFiles.poll(5, TimeUnit.SECONDS);
> if (nextFile == null) {
> fileFactory.onIOError(ActiveMQJournalBundle.BUNDLE.fileNotOpened(), "unable to open ", null);
> // We need to reconnect the current file with the timed buffer as we were not able to roll the file forward
> // If you don't do this you will get a NPE in TimedBuffer::checkSize where it uses the bufferobserver
> fileFactory.activateBuffer(journal.getCurrentFile().getFile());
> throw ActiveMQJournalBundle.BUNDLE.fileNotOpened();
> }
> if (logger.isTraceEnabled()) {
> logger.trace("Returning file " + nextFile);
> }
> return nextFile;
> }
> {code}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 10 months
[JBoss JIRA] (WFCORE-4365) Remove READ_TIMEOUT from cannel configuration options
by Teresa Miyar (Jira)
[ https://issues.jboss.org/browse/WFCORE-4365?page=com.atlassian.jira.plugi... ]
Teresa Miyar moved JBEAP-16530 to WFCORE-4365:
----------------------------------------------
Project: WildFly Core (was: JBoss Enterprise Application Platform)
Key: WFCORE-4365 (was: JBEAP-16530)
Workflow: GIT Pull Request workflow (was: CDW with loose statuses v1)
Component/s: Management
(was: Management)
Affects Version/s: 8.0.0.Final
(was: 7.2.0.GA)
> Remove READ_TIMEOUT from cannel configuration options
> -----------------------------------------------------
>
> Key: WFCORE-4365
> URL: https://issues.jboss.org/browse/WFCORE-4365
> Project: WildFly Core
> Issue Type: Bug
> Components: Management
> Affects Versions: 8.0.0.Final
> Reporter: Teresa Miyar
> Assignee: Teresa Miyar
> Priority: Major
>
> The only configuration options allowed on the channel are
> RemotingOptions.TRANSMIT_WINDOW_SIZE,
> RemotingOptions.MAX_OUTBOUND_MESSAGES
> RemotingOptions.RECEIVE_WINDOW_SIZE
> RemotingOptions.MAX_INBOUND_MESSAGES
> RemotingOptions.MAX_OUTBOUND_MESSAGE_SIZE
> RemotingOptions.MAX_INBOUND_MESSAGE_SIZE.
> Standalone.xml/domain.xml file show READ_TIMEOUT as an example option and should be removed.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
5 years, 10 months