[
https://issues.jboss.org/browse/JBMESSAGING-1631?page=com.atlassian.jira....
]
Yong Hao Gao updated JBMESSAGING-1631:
--------------------------------------
Fix Version/s: 1.4.8.SP9
(was: 1.4.8.SP8)
Messages are piling up in the queues in clustered environment and not
pulled by message sucker
----------------------------------------------------------------------------------------------
Key: JBMESSAGING-1631
URL:
https://issues.jboss.org/browse/JBMESSAGING-1631
Project: JBoss Messaging
Issue Type: Bug
Components: JMS Clustering
Affects Versions: 1.4.0.SP3.CP08
Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or
1.4.0.SP3_CP08 running on x64 windows servers.
JBoss Remoting 2.5.1 or 2.2.3
Clustered XA connection factory, clustered queues, both server and client (MDBs) are
deployed as one application - identical deployment for all servers in a cluster
Reporter: Victor Starenky
Assignee: Yong Hao Gao
Fix For: 1.4.0.SP3.CP15, 1.4.8.SP9
Attachments: 1631-test-source.zip, logs-and-config-prod.zip,
logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java
We have an EJB3 application is running on a cluster of 7 servers.
All servers have the same application farmed across them.
We have a clustered queue and using clustered connection factory.
Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a
different problem. After the cluster was running for some time (usually overnight with
lots of heavy background processing hapenning at that time) we see messages are
"piling up" in some queues on some nodes.
Sympthoms are:
MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero.
These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker
is configured as far as I can tell. Looks like the problem might be related to client
and/or server failover leaving some nodes without consumers while sucker not doing
it's job (if I understand it correctly).
Once we bump the timeout values much higher than they are in the original config files
the problem seems to disappear or at least show up much less frequently. Specifically
I'm talking about these values:
messaging-service.xml:
<attribute name="FailoverStartTimeout">180000</attribute>
remoting-bisocket-service.xml:
<attribute name="clientLeasePeriod"
isParam="true">30000</attribute>
<attribute name="validatorPingPeriod"
isParam="true">30000</attribute>
<attribute name="validatorPingTimeout"
isParam="true">20000</attribute>
<attribute
name="registerCallbackListener">false</attribute>
<attribute name="timeout"
isParam="true">240000</attribute>
While this serves as a temporary workaround we don't feel we can rely on JBM in a
production clustered environment without having failover working properly.
This is a big showstopper for us at the moment.
Attached are the log files from the prod environment of 7 servers with the config files
as well as log files from the test environment with just 2 servers (having same issue) and
corresponding configs. The logs were produced by the test version of the messaging code
with added logging as per JBMESSAGING-1456.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira