[JBoss JIRA] Created: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[JBoss JIRA] Created:...

Victor Starenky (JIRA)

Thursday, 21 May 2009 Thu, 21 May '09

1:19 p.m.

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Tim Fox We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Show replies by date

Victor Starenky (JIRA)

Thursday, 21 May Thu, 21 May

1:21 p.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Victor Starenky updated JBMESSAGING-1631: ----------------------------------------- Attachment: logs-and-config-prod.zip production cluster logs and config

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Tim Fox Attachments: logs-and-config-prod.zip We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Victor Starenky (JIRA)

1:23 p.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Tim Fox Attachments: logs-and-config-prod.zip, logs-and-config-test.zip We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

Monday, 1 June Mon, 1 Jun

2:02 a.m.

New subject: [JBoss JIRA] Assigned: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao reassigned JBMESSAGING-1631: --------------------------------------- Assignee: Howard Gao (was: Tim Fox)

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Attachments: logs-and-config-prod.zip, logs-and-config-test.zip We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

2:04 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao updated JBMESSAGING-1631: ------------------------------------ Fix Version/s: 1.4.0.SP3.CP09 1.4.5.GA

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

4:02 a.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

Tuesday, 2 June Tue, 2 Jun

9:08 a.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao commented on JBMESSAGING-1631: ----------------------------------------- Hi Victor, I wrote up a test case that looks like that: 1) Start up a two-node jbm cluster, messaging-node0 and messaging-node1. A distributed queue also deployed. 2) Deploy a MDB jar to messaging-node0/farm. The MDB receives messages from the distributed queue. 3) Start a message sending application and send the messages over ClusteredConnectionFactory. The messages will be sent to the two node in round-robin fashion. I got it run on one machine. However, to be able to reproduce the issue, I think I need to deploy the cluster to two machines. If you got a chance, can you comment on whether my test is suitable or not? Thanks. I'll update you once I got it run over certain time. Howard

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Hossein Attar (JIRA)

11:05 a.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Hossein Attar (JIRA)

11:05 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Hossein Attar updated JBMESSAGING-1631: --------------------------------------- Attachment: TestRecommendationSentProcessorMDB.java

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Hossein Attar (JIRA)

12:13 p.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Hossein Attar commented on JBMESSAGING-1631: -------------------------------------------- Howard: I think your test case is right. Here are some details about our environment that may be useful. We use java:/JmsXA as our connection factory In hajndi-jms-ds.xml, we have  <tx-connection-factory> <jndi-name>JmsXA</jndi-name> <xa-transaction/> <rar-name>jms-ra.rar</rar-name> <connection-definition>org.jboss.resource.adapter.jms.JmsConnectionFactory</connection-definition> <config-property name="SessionDefaultType" type="java.lang.String">javax.jms.Topic</config-property> <config-property name="JmsProviderAdapterJNDI" type="java.lang.String">java:/DefaultJMSProvider</config-property> <max-pool-size>20</max-pool-size> <security-domain-and-application>JmsXARealm</security-domain-and-application> </tx-connection-factory> and in connection-factories-service.xml, we have  <mbean code="org.jboss.jms.server.connectionfactory.ConnectionFactory" name="jboss.messaging.connectionfactory:service=ConnectionFactory" xmbean-dd="xmdesc/ConnectionFactory-xmbean.xml"> <depends optional-attribute-name="ServerPeer">jboss.messaging:service=ServerPeer</depends> <depends optional-attribute-name="Connector">jboss.messaging:service=Connector,transport=bisocket</depends> <depends>jboss.messaging:service=PostOffice</depends> <attribute name="JNDIBindings"> <bindings> <binding>/ConnectionFactory</binding> <binding>/XAConnectionFactory</binding> <binding>java:/ConnectionFactory</binding> <binding>java:/XAConnectionFactory</binding> </bindings> </attribute> </mbean>  <mbean code="org.jboss.jms.server.connectionfactory.ConnectionFactory" name="jboss.messaging.connectionfactory:service=ClusteredConnectionFactory" xmbean-dd="xmdesc/ConnectionFactory-xmbean.xml"> <depends optional-attribute-name="ServerPeer">jboss.messaging:service=ServerPeer</depends> <depends optional-attribute-name="Connector">jboss.messaging:service=Connector,transport=bisocket</depends> <depends>jboss.messaging:service=PostOffice</depends> <attribute name="JNDIBindings"> <bindings> <binding>/ClusteredConnectionFactory</binding> <binding>/ClusteredXAConnectionFactory</binding> <binding>java:/ClusteredConnectionFactory</binding> <binding>java:/ClusteredXAConnectionFactory</binding> </bindings> </attribute> <attribute name="SupportsFailover">true</attribute> <attribute name="SupportsLoadBalancing">true</attribute> </mbean>  <mbean code="org.jboss.jms.server.connectionfactory.ConnectionFactory" name="jboss.messaging.connectionfactory:service=ClusterPullConnectionFactory" xmbean-dd="xmdesc/ConnectionFactory-xmbean.xml"> <depends optional-attribute-name="ServerPeer">jboss.messaging:service=ServerPeer</depends> <depends optional-attribute-name="Connector">jboss.messaging:service=Connector,transport=bisocket</depends> <depends>jboss.messaging:service=PostOffice</depends> <attribute name="SupportsFailover">false</attribute> <attribute name="SupportsLoadBalancing">false</attribute> </mbean>

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

10:31 p.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Victor Starenky (JIRA)

Wednesday, 3 June Wed, 3 Jun

3:40 a.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Victor Starenky commented on JBMESSAGING-1631: ---------------------------------------------- I think one important thing to reproduce the problem is to cause client/server failover due to timeouts. This could happen either because of unreliable network or due to high cpu usage / lack of resources. If both servers are pretty much idle and sit on a good network it's quite unlikely we'll see the problem. Another difference between your test case and our environment is that we\re sending messages to the queue from the same application farmed across both servers. We're using both stateless EJB beans to get the XA clustered factory injected by the container for sending messages and just a jndi lookup. I don't think all of this matters as the problems seem to be on a receiving side but still thought I would mention this.

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

4 a.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao commented on JBMESSAGING-1631: ----------------------------------------- Thanks Victor. One thing I can do to trigger failover is reduce the ping timeout in remoting configuration, that worked well in my dealing with JBMESSAGING-1456.

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

Thursday, 11 June Thu, 11 Jun

1:03 a.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao commented on JBMESSAGING-1631: ----------------------------------------- Hi Victor, Just an update. My test has been running over night without message pile up at either node. I found the client failover didn't happen as my two machines are in a local network. I'll further reduce the ping timeout (already 500ms). However, as the MDB is deployed within the App Server. If the MDB is connected to the same Server, it's unlikely that client failover will happen. One possible reason that may affect the message sucker is that if the message sucker's connection has ping timeout. I'll try to make this happen to see if I can made any progress. Alse please note you should use remoting 2.2.3. remoting 2.5.1 doesn't include all the fixes for JBMESSAGING-1456.

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Victor Starenky (JIRA)

8:32 a.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Victor Starenky commented on JBMESSAGING-1631: ---------------------------------------------- We've switched to the latest jars for messaging that has fix for JBMESSAGING-1456 and as well switched all other required libaries including remoting (to the version 2.2.3 as per documentation). Still this morning we see one node piling up topic messages. As far as I can tell we mostly experience this issue with topics. One or more nodes stop receiving messages and in JMX console we see thousands of them sitting on these nodes. As I mentioned before we're pushing servers to the limits overnight so timeouts are likely caused by exessive load more than network.

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

9:47 a.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao commented on JBMESSAGING-1631: ----------------------------------------- Thanks Victor. I'll change to topic then. A question: when you find the messaging piling up, did you stop the sending and wait to see if the messages are still there? Also my sending program just keep sending messages in a loop, 20 messages for each transaction. is that enough to load the servers? Howard

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Victor Starenky (JIRA)

10:44 a.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Victor Starenky commented on JBMESSAGING-1631: ---------------------------------------------- Messages are sent intermittently, there isn't a huge stream of these. But AllMessageCount is in the order of thousands and only increases with every new message. What happens with topics is that some node stops receiving messages. It shows all SubscriptionCounts = 0. Once it gets into this state is stays there. We created our own monitoring system that records every sent message and every received message into database stamped with the time and machine name. This is how we find these problems. Also when I was speaking about the load I wasn't talking about JMS load but rather the overall machine load. All servers run a large application that's doing tons of jobs overnight. So this might affect ping timeouts - when machine is too busy to respond in time. This is only a theory though.

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

Wednesday, 15 July Wed, 15 Jul

10:40 p.m.

New subject: [JBoss JIRA] Commented: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao commented on JBMESSAGING-1631: ----------------------------------------- Hi Hossein, I still can't reproduce it in my env. I changed the MDB to use Topic and send messaged every 10 minutes, and 2000 messages for each sending. It looks like both MDB got all the messages in the overnight test. I'm uploading my test source (1631-test-source.zip), can you help me how to get this reproduced? Thanks. Howard

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

10:40 p.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao updated JBMESSAGING-1631: ------------------------------------ Attachment: 1631-test-source.zip

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP09, 1.4.5.GA Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

Wednesday, 16 September Wed, 16 Sep

8:33 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao updated JBMESSAGING-1631: ------------------------------------ Fix Version/s: 1.4.0.SP3.CP10 1.4.6.GA (was: 1.4.5.GA) (was: 1.4.0.SP3.CP09)

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP10, 1.4.6.GA Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

Friday, 9 October Fri, 9 Oct

9:28 p.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/jira/browse/JBMESSAGING-1631?page=com.atlassian.ji... ] Howard Gao updated JBMESSAGING-1631: ------------------------------------ Fix Version/s: 1.4.7.GA (was: 1.4.6.GA)

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP10, 1.4.7.GA Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

Thursday, 4 March Thu, 4 Mar

9:28 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP11, 1.4.7.GA, 1.4.8.GA Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Howard Gao (JIRA)

Wednesday, 28 July Wed, 28 Jul

4:03 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/browse/JBMESSAGING-1631?page=com.atlassian.jira.pl... ] Howard Gao updated JBMESSAGING-1631: ------------------------------------ Fix Version/s: (was: 1.4.7.GA)

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Howard Gao Fix For: 1.4.0.SP3.CP11, 1.4.8.GA Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira

Yong Hao Gao (JIRA)

Wednesday, 3 November Wed, 3 Nov

7:38 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://jira.jboss.org/browse/JBMESSAGING-1631?page=com.atlassian.jira.pl... ] Yong Hao Gao updated JBMESSAGING-1631: -------------------------------------- Fix Version/s: 1.4.0.SP3.CP12 (was: 1.4.0.SP3.CP11)

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://jira.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP12, 1.4.8.GA Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Yong Hao Gao (JIRA)

Friday, 29 April Fri, 29 Apr

8:01 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://issues.jboss.org/browse/JBMESSAGING-1631?page=com.atlassian.jira.... ] Yong Hao Gao updated JBMESSAGING-1631: -------------------------------------- Fix Version/s: 1.4.0.SP3.CP13 (was: 1.4.0.SP3.CP12)

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP13, 1.4.8.GA Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Yong Hao Gao (JIRA)

9:19 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://issues.jboss.org/browse/JBMESSAGING-1631?page=com.atlassian.jira.... ] Yong Hao Gao updated JBMESSAGING-1631: -------------------------------------- Fix Version/s: 1.4.8.SP1 (was: 1.4.8.GA)

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP13, 1.4.8.SP1 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Yong Hao Gao (JIRA)

Tuesday, 21 June Tue, 21 Jun

7:52 p.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP13, 1.4.8.SP2 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Yong Hao Gao (JIRA)

11:33 p.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP14, 1.4.8.SP2 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Yong Hao Gao (JIRA)

Tuesday, 19 July Tue, 19 Jul

9:54 p.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP2 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Yong Hao Gao (JIRA)

Thursday, 21 July Thu, 21 Jul

12:57 a.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP3 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Yong Hao Gao (JIRA)

Thursday, 8 September Thu, 8 Sep

10:33 p.m.

New subject: [JBoss JIRA] Updated: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP4 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Yong Hao Gao (Updated) (JIRA)

Tuesday, 25 October Tue, 25 Oct

10:44 p.m.

New subject: [JBoss JIRA] (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP5 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira

Yong Hao Gao (Updated) (JIRA)

Thursday, 27 October Thu, 27 Oct

8:22 a.m.

New subject: [JBoss JIRA] (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP6 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Yong Hao Gao (Updated) (JIRA)

Wednesday, 4 January Wed, 4 Jan

9:25 p.m.

New subject: [JBoss JIRA] (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP7 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Daniel Dumitrescu (JIRA)

Friday, 16 March Fri, 16 Mar

7:54 a.m.

New subject: [JBoss JIRA] (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

[ https://issues.jboss.org/browse/JBMESSAGING-1631?page=com.atlassian.jira.... ] Daniel Dumitrescu commented on JBMESSAGING-1631: ------------------------------------------------ Hello there We're facing the same situation with JBM 1.4.5 SP1 In our case we work a lot with clustered topics, so inside this clustered topic there are like 500 messages/second and we reach to the messages stuck situation. We tried out the workaround specified inside https://community.jboss.org/thread/129484?tstart=0, and wait to see the results. We have configured the message sucker, ClusterPullConnectionFactoryName and we have the failover support switched on. Do you have additional information about the state of this issue? Regards, Daniel

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP7 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Yong Hao Gao (JIRA)

Sunday, 15 April Sun, 15 Apr

11:07 p.m.

New subject: [JBoss JIRA] (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP8 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Yong Hao Gao (JIRA)

Sunday, 22 July Sun, 22 Jul

11:58 p.m.

New subject: [JBoss JIRA] (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP9 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

Yong Hao Gao (JIRA)

Tuesday, 16 October Tue, 16 Oct

9:26 p.m.

New subject: [JBoss JIRA] (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker

...

Messages are piling up in the queues in clustered environment and not pulled by message sucker ---------------------------------------------------------------------------------------------- Key: JBMESSAGING-1631 URL: https://issues.jboss.org/browse/JBMESSAGING-1631 Project: JBoss Messaging Issue Type: Bug Components: JMS Clustering Affects Versions: 1.4.0.SP3.CP08 Environment: Cluster of a few JBoss 4.2.3 servers with JBM 1.4.2.GA-SP1 or 1.4.0.SP3_CP08 running on x64 windows servers. JBoss Remoting 2.5.1 or 2.2.3 Clustered XA connection factory, clustered queues, both server and client (MDBs) are deployed as one application - identical deployment for all servers in a cluster Reporter: Victor Starenky Assignee: Yong Hao Gao Fix For: 1.4.0.SP3.CP15, 1.4.8.SP10 Attachments: 1631-test-source.zip, logs-and-config-prod.zip, logs-and-config-test.zip, TestRecommendationSentProcessorMDB.java We have an EJB3 application is running on a cluster of 7 servers. All servers have the same application farmed across them. We have a clustered queue and using clustered connection factory. Originally we ran into the bug JBMESSAGING-1456 and while testing the fix ran into a different problem. After the cluster was running for some time (usually overnight with lots of heavy background processing hapenning at that time) we see messages are "piling up" in some queues on some nodes. Sympthoms are: MessageCount is not zero (rather in the order of hundreeds), DeliveringCount is zero. These nodes have ConsumerCount=0 for the queues experiencing the problem. Message sucker is configured as far as I can tell. Looks like the problem might be related to client and/or server failover leaving some nodes without consumers while sucker not doing it's job (if I understand it correctly). Once we bump the timeout values much higher than they are in the original config files the problem seems to disappear or at least show up much less frequently. Specifically I'm talking about these values: messaging-service.xml: <attribute name="FailoverStartTimeout">180000</attribute> remoting-bisocket-service.xml: <attribute name="clientLeasePeriod" isParam="true">30000</attribute> <attribute name="validatorPingPeriod" isParam="true">30000</attribute> <attribute name="validatorPingTimeout" isParam="true">20000</attribute> <attribute name="registerCallbackListener">false</attribute> <attribute name="timeout" isParam="true">240000</attribute> While this serves as a temporary workaround we don't feel we can rely on JBM in a production clustered environment without having failover working properly. This is a big showstopper for us at the moment. Attached are the log files from the prod environment of 7 servers with the config files as well as log files from the test environment with just 2 servers (having same issue) and corresponding configs. The logs were produced by the test version of the messaging code with added logging as per JBMESSAGING-1456.

-- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

4931

days inactive

6176

days old

jboss-jira@lists.jboss.org

Manage subscription

37 comments

6 participants

tags (0)

participants (6)

Daniel Dumitrescu (JIRA)
Hossein Attar (JIRA)
Howard Gao (JIRA)
Victor Starenky (JIRA)
Yong Hao Gao (JIRA)
Yong Hao Gao (Updated) (JIRA)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[JBoss JIRA] Created: (JBMESSAGING-1631) Messages are piling up in the queues in clustered environment and not pulled by message sucker