[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1456) Messages stuck in being-delivered state in cluster

Sunday, 30 November 2008

    [
https://jira.jboss.org/jira/browse/JBMESSAGING-1456?page=com.atlassian.ji...
] 

Howard Gao commented on JBMESSAGING-1456:
-----------------------------------------

Hi Zach,

Thanks for the valuable information. Now I understand better. I re-checked my test case in
order to conform to your description last night before I sleep and let the test client run
over night. But I didn't get the expected stuck message to come.

This time I think I did it carefully based on your description exception some
configuration changes to make my laptop not to be overloaded, the changes include:

export.message.delay=80000 (instead of 30-40sec) 
export.number.messages.per.send=100

The consuming rate on my laptop is not fast enough to consume up the messages the producer
sends in one burst.

export.consumer.min.count=5
export.consumer.max.count=10

I reduce the number of consumers to reduce memory usage so my laptop won't run out of
memory. For the second pure consuming client, i even reduce the above to parameters down
to 2 and 8 respectively.

So 2 nodes, one producer/consumer client and one consumer client all run on my laptop.
When I checked it this morning (about 8 hours running), I didn't see there is any
message stuck.

However I did see the following exceptions from node2, like:

2008-12-01 05:07:54,710 DEBUG [org.jboss.remoting.InvokerRegistry] removed
SocketClientInvoker[1d62912, bisocket://127.0.1.1:366662811] from registry
2008-12-01 05:07:54,710 DEBUG
[org.jboss.remoting.transport.socket.MicroSocketClientInvoker]
SocketClientInvoker[1d62912, bisocket://127.0.1.1:366662811] disconnecting ...
2008-12-01 05:07:54,712 ERROR [org.jboss.remoting.ConnectionNotifier] Error notifying
connection listeners of disconnected client connection.
java.lang.NullPointerException
	at
org.jboss.remoting.ConnectionNotifier.connectionTerminated(ConnectionNotifier.java:74)
	at org.jboss.remoting.Lease.notifyClientTermination(Lease.java:159)
	at org.jboss.remoting.Lease.terminateLease(Lease.java:143)
	at org.jboss.remoting.ServerInvoker.terminateLease(ServerInvoker.java:1692)
	at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:732)
	at
org.jboss.remoting.transport.socket.ServerThread.processInvocation(ServerThread.java:573)
	at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:387)
	at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:166)

this is exception from jboss-remoting, i think I used the patched version. I don't
know what's that mean yet, seems not relevant to our case.

Now I changed the producer code a bit to allow for some 'smart' messaging sending.
What I did is after each burst of sending, the producer thread will sleep for a while and
then check the message count, if the message count is not zero, keep sleeping, if zero, it
will go and send another burst of messages.

In that way, I can easily observe if the message stuck is happening or not. It message
stuck happens, I will see the test client will keep sleeping on every message count check
even the consumers stops receiving.
No need to refresh the jmxconsole every time.

The file I changed is BaseProducerThread.java, please take a look (in the attachment) and
see if this is correct and better way to do it. Thanks.

...
 Messages stuck in being-delivered state in cluster
 --------------------------------------------------

                 Key: JBMESSAGING-1456
                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1456
             Project: JBoss Messaging
          Issue Type: Bug
    Affects Versions: 1.4.0.SP3_CP03
            Reporter: Justin Bertram
            Assignee: Howard Gao
            Priority: Critical
         Attachments: kill3_thread_dump.txt, thread_dump.txt

 Messages become "stuck" in being-delivered state when clients use a clustered
XA connection factory in a cluster of at least 2 nodes.
 JBoss setup:
   -2 nodes of JBoss EAP 4.3 CP02
   -commented out "ClusterPullConnectionFactory" in messaging-service.xml to
prevent message redistribution and eliminate the "message suckers" as the
potential culprit
   -MySQL backend using the default mysql-persistence-service.xml (from
<JBOSS_HOME>/docs/examples/jms)
 Client setup:
   -both nodes have a client which is a separate process (i.e. not inside JBoss)
   -clients are Spring based
   -one client produces and consumes, the other client just consumes
   -both clients use the ClusteredXAConnectionFactory from the default
connection-factories-service.xml
   -both clients publish to and consume from "queue/testDistributedQueue"
   -clients are configured to send persistent messages, use AUTO_ACKNOWLEDGE, and
transacted sessions
 Symptoms of the issue:
   -when running the clients I watch the JMX-Console for the
"queue/testDistributedQueue"
   -as the consumers pull messages off the queue I can see the MessageCount and
DeliveringCount go to 0 every so often
   -after a period of time (usually a few hours) the MessageCount and DeliveringCount
never go back to 0
   -I "kill" the clients and wait for the DeliveringCount to go to 0, but it
never does
   -after the clients are killed the ConsumerCount for the queue will drop, but never to 0
when messages are "stuck"
   -a thread dump reveals at least one JBM server session that is apparently stuck (it
never goes away) - ostensibly this is the consumer that is showing in the JMX-Console for
"queue/testDistributedQueue"
   -a "killall -3 java" doesn't produce anything from the clients so I know
their dead
   -nothing is in any DLQ or expiry queue
   -the database contains as many rows in the JBM_MSG and JBM_MSG_REF tables as the
DeliveringCount in the JMX-Console
   -rebooting the node with the stuck messages frees the messages to be consumed (i.e.
un-sticks them)
 Other notes:
   -nothing else is happening on either node but running the client and running JBoss
   -this only appears to happen when a clustered connection factory is used.  I tested
using a normal connection factory and after 24 hours couldn't reproduce a stuck
message. 
-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1456) Messages stuck in being-delivered state in cluster