[
https://jira.jboss.org/jira/browse/JBMESSAGING-1456?page=com.atlassian.ji...
]
Howard Gao commented on JBMESSAGING-1456:
-----------------------------------------
Hi Zach,
Thanks for the valuable information. Now I understand better. I re-checked my test case in
order to conform to your description last night before I sleep and let the test client run
over night. But I didn't get the expected stuck message to come.
This time I think I did it carefully based on your description exception some
configuration changes to make my laptop not to be overloaded, the changes include:
export.message.delay=80000 (instead of 30-40sec)
export.number.messages.per.send=100
The consuming rate on my laptop is not fast enough to consume up the messages the producer
sends in one burst.
export.consumer.min.count=5
export.consumer.max.count=10
I reduce the number of consumers to reduce memory usage so my laptop won't run out of
memory. For the second pure consuming client, i even reduce the above to parameters down
to 2 and 8 respectively.
So 2 nodes, one producer/consumer client and one consumer client all run on my laptop.
When I checked it this morning (about 8 hours running), I didn't see there is any
message stuck.
However I did see the following exceptions from node2, like:
2008-12-01 05:07:54,710 DEBUG [org.jboss.remoting.InvokerRegistry] removed
SocketClientInvoker[1d62912, bisocket://127.0.1.1:366662811] from registry
2008-12-01 05:07:54,710 DEBUG
[org.jboss.remoting.transport.socket.MicroSocketClientInvoker]
SocketClientInvoker[1d62912, bisocket://127.0.1.1:366662811] disconnecting ...
2008-12-01 05:07:54,712 ERROR [org.jboss.remoting.ConnectionNotifier] Error notifying
connection listeners of disconnected client connection.
java.lang.NullPointerException
at
org.jboss.remoting.ConnectionNotifier.connectionTerminated(ConnectionNotifier.java:74)
at org.jboss.remoting.Lease.notifyClientTermination(Lease.java:159)
at org.jboss.remoting.Lease.terminateLease(Lease.java:143)
at org.jboss.remoting.ServerInvoker.terminateLease(ServerInvoker.java:1692)
at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:732)
at
org.jboss.remoting.transport.socket.ServerThread.processInvocation(ServerThread.java:573)
at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:387)
at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:166)
this is exception from jboss-remoting, i think I used the patched version. I don't
know what's that mean yet, seems not relevant to our case.
Now I changed the producer code a bit to allow for some 'smart' messaging sending.
What I did is after each burst of sending, the producer thread will sleep for a while and
then check the message count, if the message count is not zero, keep sleeping, if zero, it
will go and send another burst of messages.
In that way, I can easily observe if the message stuck is happening or not. It message
stuck happens, I will see the test client will keep sleeping on every message count check
even the consumers stops receiving.
No need to refresh the jmxconsole every time.
The file I changed is BaseProducerThread.java, please take a look (in the attachment) and
see if this is correct and better way to do it. Thanks.
Messages stuck in being-delivered state in cluster
--------------------------------------------------
Key: JBMESSAGING-1456
URL:
https://jira.jboss.org/jira/browse/JBMESSAGING-1456
Project: JBoss Messaging
Issue Type: Bug
Affects Versions: 1.4.0.SP3_CP03
Reporter: Justin Bertram
Assignee: Howard Gao
Priority: Critical
Attachments: kill3_thread_dump.txt, thread_dump.txt
Messages become "stuck" in being-delivered state when clients use a clustered
XA connection factory in a cluster of at least 2 nodes.
JBoss setup:
-2 nodes of JBoss EAP 4.3 CP02
-commented out "ClusterPullConnectionFactory" in messaging-service.xml to
prevent message redistribution and eliminate the "message suckers" as the
potential culprit
-MySQL backend using the default mysql-persistence-service.xml (from
<JBOSS_HOME>/docs/examples/jms)
Client setup:
-both nodes have a client which is a separate process (i.e. not inside JBoss)
-clients are Spring based
-one client produces and consumes, the other client just consumes
-both clients use the ClusteredXAConnectionFactory from the default
connection-factories-service.xml
-both clients publish to and consume from "queue/testDistributedQueue"
-clients are configured to send persistent messages, use AUTO_ACKNOWLEDGE, and
transacted sessions
Symptoms of the issue:
-when running the clients I watch the JMX-Console for the
"queue/testDistributedQueue"
-as the consumers pull messages off the queue I can see the MessageCount and
DeliveringCount go to 0 every so often
-after a period of time (usually a few hours) the MessageCount and DeliveringCount
never go back to 0
-I "kill" the clients and wait for the DeliveringCount to go to 0, but it
never does
-after the clients are killed the ConsumerCount for the queue will drop, but never to 0
when messages are "stuck"
-a thread dump reveals at least one JBM server session that is apparently stuck (it
never goes away) - ostensibly this is the consumer that is showing in the JMX-Console for
"queue/testDistributedQueue"
-a "killall -3 java" doesn't produce anything from the clients so I know
their dead
-nothing is in any DLQ or expiry queue
-the database contains as many rows in the JBM_MSG and JBM_MSG_REF tables as the
DeliveringCount in the JMX-Console
-rebooting the node with the stuck messages frees the messages to be consumed (i.e.
un-sticks them)
Other notes:
-nothing else is happening on either node but running the client and running JBoss
-this only appears to happen when a clustered connection factory is used. I tested
using a normal connection factory and after 24 hours couldn't reproduce a stuck
message.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira