[
https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji...
]
Howard Gao commented on JBMESSAGING-1631:
-----------------------------------------
Hi Victor,
Just an update. My test has been running over night without message pile up at either
node. I found the client failover didn't happen as my two machines are in a local
network.
I'll further reduce the ping timeout (already 500ms). However, as the MDB is deployed
within the App Server. If the MDB is connected to the same Server, it's unlikely that
client failover will happen.
One possible reason that may affect the message sucker is that if the message sucker's
connection has ping timeout. I'll try to make this happen to see if I can made any
progress.
Alse please note you should use remoting 2.2.3. remoting 2.5.1 doesn't include all the
fixes for JBMESSAGING-1456.
Messages are piling up in the queues in clustered environment and not
pulled by message sucker
----------------------------------------------------------------------------------------------
Key: JBMESSAGING-1631
URL:
https://jira.jboss.org/jira/browse/JBMESSAGING-1631
Project: JBoss Messaging
Issue Type: Bug
Components: JMS Clustering
Affects Versions: 1.4.0.SP3.CP08
Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or
1.4.0.SP3_CP08 running on x64 windows servers.
JBoss Remoting 2.5.1 or 2.2.3
Clustered XA connection factory, clustered queues, both server and client (MDBs) are
deployed as one application - identical deployment for all servers in a cluster
Reporter: Victor Starenky
Assignee: Howard Gao
Fix For: 1.4.0.SP3.CP09, 1.4.5.GA
Attachments: logs-and-config-prod.zip, logs-and-config-test.zip,
TestRecommendationSentProcessorMDB.java
We have an EJB3 application is running on a cluster of 7 servers.
All servers have the same application farmed across them.
We have a clustered queue and using clustered connection factory.
Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a
different problem. After the cluster was running for some time (usually overnight with
lots of heavy background processing hapenning at that time) we see messages are
"piling up" in some queues on some nodes.
Sympthoms are:
MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero.
These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker
is configured as far as I can tell. Looks like the problem might be related to client
and/or server failover leaving some nodes without consumers while sucker not doing
it's job (if I understand it correctly).
Once we bump the timeout values much higher than they are in the original config files
the problem seems to disappear or at least show up much less frequently. Specifically
I'm talking about these values:
messaging-service.xml:
<attribute name="FailoverStartTimeout">180000</attribute>
remoting-bisocket-service.xml:
<attribute name="clientLeasePeriod"
isParam="true">30000</attribute>
<attribute name="validatorPingPeriod"
isParam="true">30000</attribute>
<attribute name="validatorPingTimeout"
isParam="true">20000</attribute>
<attribute
name="registerCallbackListener">false</attribute>
<attribute name="timeout"
isParam="true">240000</attribute>
While this serves as a temporary workaround we don't feel we can rely on JBM in a
production clustered environment without having failover working properly.
This is a big showstopper for us at the moment.
Attached are the log files from the prod environment of 7 servers with the config files
as well as log files from the test environment with just 2 servers (having same issue) and
corresponding configs. The logs were produced by the test version of the messaging code
with added logging as per JBMESSAGING-1456.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira