[jboss-user] [JBoss Messaging] - JBM Messages stuck in Cluster Environment

Mon Dec 15 05:59:14 EST 2008

Our system consists of two physical multicore machines running Red Hat Enterprise
Server 64 bit, Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_16-b02, mixed mode)
and JBoss 4.3.0.GA-CP02. Each physical machine runs two instances of JBoss, for a
total of four JBoss instances. The four JBoss instances are partitioned into 2
clusters, and both clusters contain 2 JBoss instances from different physical
machines.

One of the cluster partitions (called "messaging") is dedicated to provide JMS
services, based on JBoss Messaging,  for clients running inside JBoss on the other
cluster partition (called "mule"). The "messaging" partition uses the "production"
cluster configuration, with two important changes:

In the
${JBOSS_INSTANCE}/deploy/jboss-messaging.sar/connection-factories-service.xml, we
have made the following changes (to improve message load-balancing across
consumers):

<mbean code="org.jboss.jms.server.connectionfactory.ConnectionFactory"
  |   ...
  |       <attribute name="SupportsFailover">true</attribute>
  |       <attribute name="SupportsLoadBalancing">true</attribute>
  | 
  |       <attribute name="PrefetchSize">0</attribute>
  |       <attribute name="SlowConsumers">true</attribute>
  | </mbean>

In the <JBOSS_SERVER>/deploy/jboss-messaging.sar/messaging-service.xml, we have made
the following changes (to effectively disable DLQ functionality):

<mbean code="org.jboss.jms.server.ServerPeer"
  | name="jboss.messaging:service=ServerPeer" xmbean-dd="xmdesc/ServerPeer-xmbean.xml">
  | ...
  |     <attribute name="DefaultMaxDeliveryAttempts">2147483647</attribute>         <!--
  | Integer.MAX_VALUE since it doesn't support infinite redeliveries -->
  | 
  |     <attribute name="DefaultRedeliveryDelay">1000</attribute>                         <!-- 1 second -->
  | 
  |     <attribute name="SuckerPassword">thePassword</attribute>
  |   ...
  | </mbean>

The JMS clients (on the "mule" instance) use the multicast method to
connect/discover the JMS service (on the "messaging" instance). They use the
following settings:

jms.connection.factory.jndi.name=/ClusteredConnectionFactory
  | jms.xaconnection.factory.jndi.name=/ClusteredXAConnectionFactory
  | 
  | jnp.jms.partition.name=Messaging
  | jnp.jms.partition.udpGroup=228.9.3.2
  | jnp.jms.partition.discoveryPort=1102

We have managed to isolate and reproduce within a few tries the following issue: if
all consumers (from the "mule cluster") for a clustered queue are connected to node
A of the "messaging" cluster and the producers (from the "mule cluster") on the same
queue post messages to node B of the cluster, then the messages remain on node B.

In the JMX console we can observe that:
1. Clustered Queue X on "messaging" node A has 8 consumers,
MessageCount=Delivering=Scheduled=0.
2. Clustered Queue X on "messaging" node B has 0 consumers, MessageCount=2,
Delivering=Scheduled=0.

We have waited for the message sucker to move the messages to node A, but this
hasn't happened. If we shutdown node A, all consumers move over to node B and
consume the messages. We have yet to run a test for the case when we shutdown node
B. 

In our production environment, we have not encountered this problem very often (2-3
times in the last few months), because we have a large number of consumers per queue
(30-80) and they are almost evenly distributed on the two "messaging" cluster nodes.
In the next weeks, we will add more functionality to our ESB, and our requirements
do not tolerate this rate of failures.

Since the system is supposed to go live in January any ideas or hints would be very helpful! 

Thanks in advance.

Sebastian 

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4196514#4196514

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4196514