[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1456) Messages stuck in being-delivered state in cluster

Howard Gao (JIRA) jira-events at lists.jboss.org
Mon Dec 1 00:26:36 EST 2008


    [ https://jira.jboss.org/jira/browse/JBMESSAGING-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12440522#action_12440522 ] 

Howard Gao commented on JBMESSAGING-1456:
-----------------------------------------

Hi Zach,

Thanks for the valuable information. Now I understand better. I re-checked my test case in order to conform to your description last night before I sleep and let the test client run over night. But I didn't get the expected stuck message to come.

This time I think I did it carefully based on your description exception some configuration changes to make my laptop not to be overloaded, the changes include:

export.message.delay=80000 (instead of 30-40sec) 
export.number.messages.per.send=100

The consuming rate on my laptop is not fast enough to consume up the messages the producer sends in one burst.

export.consumer.min.count=5
export.consumer.max.count=10

I reduce the number of consumers to reduce memory usage so my laptop won't run out of memory. For the second pure consuming client, i even reduce the above to parameters down to 2 and 8 respectively.

So 2 nodes, one producer/consumer client and one consumer client all run on my laptop. When I checked it this morning (about 8 hours running), I didn't see there is any message stuck.

However I did see the following exceptions from node2, like:

2008-12-01 05:07:54,710 DEBUG [org.jboss.remoting.InvokerRegistry] removed SocketClientInvoker[1d62912, bisocket://127.0.1.1:366662811] from registry
2008-12-01 05:07:54,710 DEBUG [org.jboss.remoting.transport.socket.MicroSocketClientInvoker] SocketClientInvoker[1d62912, bisocket://127.0.1.1:366662811] disconnecting ...
2008-12-01 05:07:54,712 ERROR [org.jboss.remoting.ConnectionNotifier] Error notifying connection listeners of disconnected client connection.
java.lang.NullPointerException
	at org.jboss.remoting.ConnectionNotifier.connectionTerminated(ConnectionNotifier.java:74)
	at org.jboss.remoting.Lease.notifyClientTermination(Lease.java:159)
	at org.jboss.remoting.Lease.terminateLease(Lease.java:143)
	at org.jboss.remoting.ServerInvoker.terminateLease(ServerInvoker.java:1692)
	at org.jboss.remoting.ServerInvoker.invoke(ServerInvoker.java:732)
	at org.jboss.remoting.transport.socket.ServerThread.processInvocation(ServerThread.java:573)
	at org.jboss.remoting.transport.socket.ServerThread.dorun(ServerThread.java:387)
	at org.jboss.remoting.transport.socket.ServerThread.run(ServerThread.java:166)

this is exception from jboss-remoting, i think I used the patched version. I don't know what's that mean yet, seems not relevant to our case.

Now I changed the producer code a bit to allow for some 'smart' messaging sending. What I did is after each burst of sending, the producer thread will sleep for a while and then check the message count, if the message count is not zero, keep sleeping, if zero, it will go and send another burst of messages.

In that way, I can easily observe if the message stuck is happening or not. It message stuck happens, I will see the test client will keep sleeping on every message count check even the consumers stops receiving.
No need to refresh the jmxconsole every time.

The file I changed is BaseProducerThread.java, please take a look (in the attachment) and see if this is correct and better way to do it. Thanks.


> Messages stuck in being-delivered state in cluster
> --------------------------------------------------
>
>                 Key: JBMESSAGING-1456
>                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1456
>             Project: JBoss Messaging
>          Issue Type: Bug
>    Affects Versions: 1.4.0.SP3_CP03
>            Reporter: Justin Bertram
>            Assignee: Howard Gao
>            Priority: Critical
>         Attachments: kill3_thread_dump.txt, thread_dump.txt
>
>
> Messages become "stuck" in being-delivered state when clients use a clustered XA connection factory in a cluster of at least 2 nodes.
> JBoss setup:
>   -2 nodes of JBoss EAP 4.3 CP02
>   -commented out "ClusterPullConnectionFactory" in messaging-service.xml to prevent message redistribution and eliminate the "message suckers" as the potential culprit
>   -MySQL backend using the default mysql-persistence-service.xml (from <JBOSS_HOME>/docs/examples/jms)
> Client setup:
>   -both nodes have a client which is a separate process (i.e. not inside JBoss)
>   -clients are Spring based
>   -one client produces and consumes, the other client just consumes
>   -both clients use the ClusteredXAConnectionFactory from the default connection-factories-service.xml
>   -both clients publish to and consume from "queue/testDistributedQueue"
>   -clients are configured to send persistent messages, use AUTO_ACKNOWLEDGE, and transacted sessions
> Symptoms of the issue:
>   -when running the clients I watch the JMX-Console for the "queue/testDistributedQueue"
>   -as the consumers pull messages off the queue I can see the MessageCount and DeliveringCount go to 0 every so often
>   -after a period of time (usually a few hours) the MessageCount and DeliveringCount never go back to 0
>   -I "kill" the clients and wait for the DeliveringCount to go to 0, but it never does
>   -after the clients are killed the ConsumerCount for the queue will drop, but never to 0 when messages are "stuck"
>   -a thread dump reveals at least one JBM server session that is apparently stuck (it never goes away) - ostensibly this is the consumer that is showing in the JMX-Console for "queue/testDistributedQueue"
>   -a "killall -3 java" doesn't produce anything from the clients so I know their dead
>   -nothing is in any DLQ or expiry queue
>   -the database contains as many rows in the JBM_MSG and JBM_MSG_REF tables as the DeliveringCount in the JMX-Console
>   -rebooting the node with the stuck messages frees the messages to be consumed (i.e. un-sticks them)
> Other notes:
>   -nothing else is happening on either node but running the client and running JBoss
>   -this only appears to happen when a clustered connection factory is used.  I tested using a normal connection factory and after 24 hours couldn't reproduce a stuck message.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list