Hi Guys,
We're seeing some cluster recoverability issues. We're using JBoss 4.0.5 in a
clustered configuration (just a tw
For the most part, everything works great.
A while back we had a quartz job that caused an OutOfMemoryException in one node of the
cluster, after which, the whole cluster fell apart.
To try and reproduce this situation, I've created an admin only URL where I can cause
one of the following two things on a single node of the cluster :
1) call system.exit
2) start a quartz job that purposefully runs the node out of memory.
I just ran a small test with scenario #1, and I can reproduce the problem. Basically, I
hit the URL on node1, causing the JVM to exit. After that, node2 is still present, but
the application is hamstrung (we get exceptions on any operation which touch JMS).
I've watched the logs on node2 when I cause node1 to die, and I do see the JMS
queues/etc migrate from node1 (now dead) to node2. Here are the migration type log
messages (immediately after node1 has died) :
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal
10000ms for ReportServicePreExecutionMdb
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal
10000ms for ReportServicePostExecutionMdb
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal
10000ms for ReportServiceDownloadMdb
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO
[org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms
for DeliveryServiceMdb
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] []
INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal
10000ms for ApplicationEventsMdb
| 2007-09-21 11:58:22,140 [MessageDispatcher up processing thread] [] [] INFO
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.focus-rcl-cluster] New cluster
view for partition focus-rcl-cluster (id: 2, delta: -1) : [10.10.11.14:1199]
| 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] I am
(10.10.11.14:1199) received membershipChanged event:
| 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] Dead
members: 1 ([10.10.11.13:1199])
| 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] New
Members : 0 ([])
| 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] All
Members : 1 ([10.10.11.14:1199])
| 2007-09-21 11:58:22,656 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.web.tomcat.tc5.TomcatDeployer] deploy, ctxPath=/jbossmq-httpil,
warUrl=.../deploy-hasingleton/jms/jbossmq-httpil.sar/jbossmq-httpil.war/
| 2007-09-21 11:58:23,672 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.il.uil2.UILServerILService] JBossMQ UIL service available at :
/0.0.0.0:8193
| 2007-09-21 11:58:23,703 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.DLQ] Bound to JNDI name: queue/DLQ
| 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/reportServicePreExecuteQueue] Bound to JNDI name:
queue/rcl/reportServicePreExecuteQueue
| 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/reportServiceExecuteQueue] Bound to JNDI name:
queue/rcl/reportServiceExecuteQueue
| 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/reportServiceDownloadQueue] Bound to JNDI name:
queue/rcl/reportServiceDownloadQueue
| 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/reportServicePostExecuteQueue] Bound to JNDI name:
queue/rcl/reportServicePostExecuteQueue
| 2007-09-21 11:58:23,734 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/deliveryServiceQueue] Bound to JNDI name:
queue/rcl/deliveryServiceQueue
| 2007-09-21 11:58:23,750 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Topic.rcl/events/reportEventsTopic] Bound to JNDI name:
topic/rcl/events/reportEventsTopic
| 2007-09-21 11:58:23,750 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Topic.rcl/events/applicationEventsTopic] Bound to JNDI name:
topic/rcl/events/applicationEventsTopic
| 2007-09-21 11:58:25,297 [MessageDispatcher up processing thread] [] [] INFO
[org.jboss.cache.TreeCache] viewAccepted(): [magnum:3542|2] [magnum:3542]
| 2007-09-21 11:58:30,703 [JMSContainerInvoker(ReportServiceExecutionMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS
provider for ReportServiceExecutionMdb
| 2007-09-21 11:58:30,734 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO
[org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for
DeliveryServiceMdb
| 2007-09-21 11:58:30,734 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS
provider for ReportServiceDownloadMdb
| 2007-09-21 11:58:30,734 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS
provider for ReportServicePreExecutionMdb
| 2007-09-21 11:58:30,734 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] []
INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider
for ApplicationEventsMdb
| 2007-09-21 11:58:30,750 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS
provider for ReportServicePostExecutionMdb
| 2007-09-21 11:58:30,828 [JMSContainerInvoker(ReportServiceExecutionMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for
ReportServiceExecutionMdb
| 2007-09-21 11:58:30,844 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider
for ReportServicePostExecutionMdb
| 2007-09-21 11:58:30,859 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] []
INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for
ApplicationEventsMdb
| 2007-09-21 11:58:30,859 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO
[org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for
DeliveryServiceMdb
| 2007-09-21 11:58:30,859 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider
for ReportServicePreExecutionMdb
| 2007-09-21 11:58:30,859 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for
ReportServiceDownloadMdb
|
The exceptions we see after we kill node1 and then try to perform an operation that
touches JMS on node2 are the following :
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal
10000ms for ReportServicePreExecutionMdb
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal
10000ms for ReportServicePostExecutionMdb
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal
10000ms for ReportServiceDownloadMdb
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO
[org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal 10000ms
for DeliveryServiceMdb
| 2007-09-21 11:58:20,703 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] []
INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Waiting for reconnect internal
10000ms for ApplicationEventsMdb
| 2007-09-21 11:58:22,140 [MessageDispatcher up processing thread] [] [] INFO
[org.jboss.ha.framework.interfaces.HAPartition.lifecycle.focus-rcl-cluster] New cluster
view for partition focus-rcl-cluster (id: 2, delta: -1) : [10.10.11.14:1199]
| 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] I am
(10.10.11.14:1199) received membershipChanged event:
| 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] Dead
members: 1 ([10.10.11.13:1199])
| 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] New
Members : 0 ([])
| 2007-09-21 11:58:22,156 [AsynchViewChangeHandler Thread] [] [] INFO
[org.jboss.ha.framework.server.DistributedReplicantManagerImpl.focus-rcl-cluster] All
Members : 1 ([10.10.11.14:1199])
| 2007-09-21 11:58:22,656 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.web.tomcat.tc5.TomcatDeployer] deploy, ctxPath=/jbossmq-httpil,
warUrl=.../deploy-hasingleton/jms/jbossmq-httpil.sar/jbossmq-httpil.war/
| 2007-09-21 11:58:23,672 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.il.uil2.UILServerILService] JBossMQ UIL service available at :
/0.0.0.0:8193
| 2007-09-21 11:58:23,703 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.DLQ] Bound to JNDI name: queue/DLQ
| 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/reportServicePreExecuteQueue] Bound to JNDI name:
queue/rcl/reportServicePreExecuteQueue
| 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/reportServiceExecuteQueue] Bound to JNDI name:
queue/rcl/reportServiceExecuteQueue
| 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/reportServiceDownloadQueue] Bound to JNDI name:
queue/rcl/reportServiceDownloadQueue
| 2007-09-21 11:58:23,719 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/reportServicePostExecuteQueue] Bound to JNDI name:
queue/rcl/reportServicePostExecuteQueue
| 2007-09-21 11:58:23,734 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Queue.rcl/deliveryServiceQueue] Bound to JNDI name:
queue/rcl/deliveryServiceQueue
| 2007-09-21 11:58:23,750 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Topic.rcl/events/reportEventsTopic] Bound to JNDI name:
topic/rcl/events/reportEventsTopic
| 2007-09-21 11:58:23,750 [AsynchKeyChangeHandler Thread] [] [] INFO
[org.jboss.mq.server.jmx.Topic.rcl/events/applicationEventsTopic] Bound to JNDI name:
topic/rcl/events/applicationEventsTopic
| 2007-09-21 11:58:25,297 [MessageDispatcher up processing thread] [] [] INFO
[org.jboss.cache.TreeCache] viewAccepted(): [magnum:3542|2] [magnum:3542]
| 2007-09-21 11:58:30,703 [JMSContainerInvoker(ReportServiceExecutionMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS
provider for ReportServiceExecutionMdb
| 2007-09-21 11:58:30,734 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO
[org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider for
DeliveryServiceMdb
| 2007-09-21 11:58:30,734 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS
provider for ReportServiceDownloadMdb
| 2007-09-21 11:58:30,734 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS
provider for ReportServicePreExecutionMdb
| 2007-09-21 11:58:30,734 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] []
INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS provider
for ApplicationEventsMdb
| 2007-09-21 11:58:30,750 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Trying to reconnect to JMS
provider for ReportServicePostExecutionMdb
| 2007-09-21 11:58:30,828 [JMSContainerInvoker(ReportServiceExecutionMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for
ReportServiceExecutionMdb
| 2007-09-21 11:58:30,844 [JMSContainerInvoker(ReportServicePostExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider
for ReportServicePostExecutionMdb
| 2007-09-21 11:58:30,859 [JMSContainerInvoker(ApplicationEventsMdb) Reconnect] [] []
INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for
ApplicationEventsMdb
| 2007-09-21 11:58:30,859 [JMSContainerInvoker(DeliveryServiceMdb) Reconnect] [] [] INFO
[org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for
DeliveryServiceMdb
| 2007-09-21 11:58:30,859 [JMSContainerInvoker(ReportServicePreExecutionMdb) Reconnect]
[] [] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider
for ReportServicePreExecutionMdb
| 2007-09-21 11:58:30,859 [JMSContainerInvoker(ReportServiceDownloadMdb) Reconnect] []
[] INFO [org.jboss.ejb.plugins.jms.JMSContainerInvoker] Reconnected to JMS provider for
ReportServiceDownloadMdb
|
Any ideas...?
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4087403#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...