[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1456) Messages stuck in being-delivered state in cluster
Howard Gao (JIRA)
jira-events at lists.jboss.org
Mon Dec 1 00:26:36 EST 2008
[ https://jira.jboss.org/jira/browse/JBMESSAGING-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12440522#action_12440522 ]
Howard Gao commented on JBMESSAGING-1456:
-----------------------------------------
Hi Zach,
Thanks for the valuable information. Now I understand better. I re-checked my test case in order to conform to your description last night before I sleep and let the test client run over night. But I didn't get the expected stuck message to come.
This time I think I did it carefully based on your description exception some configuration changes to make my laptop not to be overloaded, the changes include:
export.message.delay=80000 (instead of 30-40sec)
export.number.messages.per.send=100
The consuming rate on my laptop is not fast enough to consume up the messages the producer sends in one burst.
export.consumer.min.count=5
export.consumer.max.count=10
I reduce the number of consumers to reduce memory usage so my laptop won't run out of memory. For the second pure consuming client, i even reduce the above to parameters down to 2 and 8 respectively.
So 2 nodes, one producer/consumer client and one consumer client all run on my laptop. When I checked it this morning (about 8 hours running), I didn't see there is any message stuck.
However I did see the following exceptions from node2, like:
2008-12-01 05:07:54,710 DEBUG [org.jboss.remoting.InvokerRegistry] removed SocketClientInvoker[1d62912, bisocket://127.0.1.1:366662811] from registry
2008-12-01 05:07:54,710 DEBUG [org.jboss.remoting.transport.socket.MicroSocketClientInvoker] SocketClientInvoker[1d62912, bisocket://127.0.1.1:366662811] disconnecting ...
2008-12-01 05:07:54,712 ERROR [org.jboss.remoting.ConnectionNotifier] Error notifying connection listeners of disconnected client connection.
java.lang.NullPointerException
at org.jboss.remoting.ConnectionNotifier.connectionTerminated(ConnectionNotifier.java:74)
at org.jboss.remoting.Lease.notifyClientTermination(Lease.java:159)
at org.jboss.remoting.Lease.terminateLease(Lease.java:143)
at org.jboss.remoting.ServerInvoker.terminateLease(ServerInvoker.java:1692)
at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:732)
at org.jboss.remoting.transport.socket.ServerThread.processInvocation(ServerThread.java:573)
at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:387)
at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:166)
this is exception from jboss-remoting, i think I used the patched version. I don't know what's that mean yet, seems not relevant to our case.
Now I changed the producer code a bit to allow for some 'smart' messaging sending. What I did is after each burst of sending, the producer thread will sleep for a while and then check the message count, if the message count is not zero, keep sleeping, if zero, it will go and send another burst of messages.
In that way, I can easily observe if the message stuck is happening or not. It message stuck happens, I will see the test client will keep sleeping on every message count check even the consumers stops receiving.
No need to refresh the jmxconsole every time.
The file I changed is BaseProducerThread.java, please take a look (in the attachment) and see if this is correct and better way to do it. Thanks.
> Messages stuck in being-delivered state in cluster
> --------------------------------------------------
>
> Key: JBMESSAGING-1456
> URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1456
> Project: JBoss Messaging
> Issue Type: Bug
> Affects Versions: 1.4.0.SP3_CP03
> Reporter: Justin Bertram
> Assignee: Howard Gao
> Priority: Critical
> Attachments: kill3_thread_dump.txt, thread_dump.txt
>
>
> Messages become "stuck" in being-delivered state when clients use a clustered XA connection factory in a cluster of at least 2 nodes.
> JBoss setup:
> -2 nodes of JBoss EAP 4.3 CP02
> -commented out "ClusterPullConnectionFactory" in messaging-service.xml to prevent message redistribution and eliminate the "message suckers" as the potential culprit
> -MySQL backend using the default mysql-persistence-service.xml (from <JBOSS_HOME>/docs/examples/jms)
> Client setup:
> -both nodes have a client which is a separate process (i.e. not inside JBoss)
> -clients are Spring based
> -one client produces and consumes, the other client just consumes
> -both clients use the ClusteredXAConnectionFactory from the default connection-factories-service.xml
> -both clients publish to and consume from "queue/testDistributedQueue"
> -clients are configured to send persistent messages, use AUTO_ACKNOWLEDGE, and transacted sessions
> Symptoms of the issue:
> -when running the clients I watch the JMX-Console for the "queue/testDistributedQueue"
> -as the consumers pull messages off the queue I can see the MessageCount and DeliveringCount go to 0 every so often
> -after a period of time (usually a few hours) the MessageCount and DeliveringCount never go back to 0
> -I "kill" the clients and wait for the DeliveringCount to go to 0, but it never does
> -after the clients are killed the ConsumerCount for the queue will drop, but never to 0 when messages are "stuck"
> -a thread dump reveals at least one JBM server session that is apparently stuck (it never goes away) - ostensibly this is the consumer that is showing in the JMX-Console for "queue/testDistributedQueue"
> -a "killall -3 java" doesn't produce anything from the clients so I know their dead
> -nothing is in any DLQ or expiry queue
> -the database contains as many rows in the JBM_MSG and JBM_MSG_REF tables as the DeliveringCount in the JMX-Console
> -rebooting the node with the stuck messages frees the messages to be consumed (i.e. un-sticks them)
> Other notes:
> -nothing else is happening on either node but running the client and running JBoss
> -this only appears to happen when a clustered connection factory is used. I tested using a normal connection factory and after 24 hours couldn't reproduce a stuck message.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list