[jboss-jira] [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

Thu Sep 8 23:33:38 EDT 2011

     [ https://issues.jboss.org/browse/JBMESSAGING-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yong Hao Gao updated JBMESSAGING-1631:
--------------------------------------

    Fix Version/s: 1.4.8.SP4
                       (was: 1.4.8.SP3)


> Messages are piling up in the queues in clustered environment and not pulled by message sucker
> ----------------------------------------------------------------------------------------------
>
>                 Key: JBMESSAGING-1631
>                 URL: https://issues.jboss.org/browse/JBMESSAGING-1631
>             Project: JBoss Messaging
>          Issue Type: Bug
>          Components: JMS Clustering
>    Affects Versions: 1.4.0.SP3.CP08
>         Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers.
> JBoss Remoting 2.5.1 or 2.2.3
> Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster
>            Reporter: Victor Starenky
>            Assignee: Yong Hao Gao
>             Fix For: 1.4.0.SP3.CP15, 1.4.8.SP4
>
>         Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java
>
>
> We have an EJB3 application is running on a cluster of 7 servers. 
> All servers have the same application farmed across them. 
> We have a clustered queue and using clustered connection factory.
> Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes.
> Sympthoms are:
> MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). 
> Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values:
> messaging-service.xml:
>       <attribute name="FailoverStartTimeout">180000</attribute>
> remoting-bisocket-service.xml:
>                <attribute name="clientLeasePeriod" isParam="true">30000</attribute>
>                <attribute name="validatorPingPeriod" isParam="true">30000</attribute>
>                <attribute name="validatorPingTimeout" isParam="true">20000</attribute>
>                <attribute name="registerCallbackListener">false</attribute>
>                <attribute name="timeout" isParam="true">240000</attribute>
> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly.
> This is a big showstopper for us at the moment.
> Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira