[jboss-jira] [JBoss JIRA] Closed: (JBMESSAGING-1456) Messages stuck in being-delivered state in cluster

Howard Gao (JIRA) jira-events at lists.jboss.org
Fri Mar 27 07:53:27 EDT 2009


     [ https://jira.jboss.org/jira/browse/JBMESSAGING-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Howard Gao closed JBMESSAGING-1456.
-----------------------------------

    Resolution: Done


There are two associated issues with the case:

Message Stuck -- messages are kept in delivering state and never be delivered again unless the server is restarted (for persistent messages).
this can happen in the following conditions:
   1. The client ping timeout and JBM tries to disconnect from the server (this happens in cluster).
   2. Due to the network/remoting issue, the server will receive a 'normal' disconnection event
   3. The server assumes the client has already closed it's connection and therefore doesn't clean up
   4. So the connection at the server is left open, including the sessions created on that connection.
   5. If the sessions contains messages in delivering, those messages will appear stuck.
   To fix this, either the server side always tries to clean up the connection whenever a disconnection happens (JBM workaround)
   or the remoting layer handle this correctly (remoting update). Here we do JBM workaround to handle the case (this will be independent of the fix from remoting layer)

Negative message count -- Message count can be negative
   Error Scenario:
   1. Server side detects the connection failure (lease timeout) and close the connection/session
   2. The session endpoint will cancel the messages being delivered to the queue.
   3. At the same time the client side received some of the messages and acknowledge them
   4. The acknowledge action will decrease the delivering count of the queue, and the session endpoint
       cancel also decrease the delivering count.
   5. If not synchronized, one message may be canceled and acked at the same time, so the delivering count
       will be decreased twice for each message, resulting in negative message count.
 To fix this we add the synchronization around the deliveries in the ServerSessionEndpoint in localClose() and acknowledgeDeliveryInternal().


> Messages stuck in being-delivered state in cluster
> --------------------------------------------------
>
>                 Key: JBMESSAGING-1456
>                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1456
>             Project: JBoss Messaging
>          Issue Type: Bug
>    Affects Versions: 1.4.0.SP3_CP03, 1.4.0.SP3.CP07
>            Reporter: Justin Bertram
>            Assignee: Howard Gao
>            Priority: Critical
>             Fix For: 1.4.0.SP3.CP08, 1.4.4.GA
>
>         Attachments: kill3_thread_dump.txt, thread_dump.txt
>
>
> Messages become "stuck" in being-delivered state when clients use a clustered XA connection factory in a cluster of at least 2 nodes.
> JBoss setup:
>   -2 nodes of JBoss EAP 4.3 CP02
>   -commented out "ClusterPullConnectionFactory" in messaging-service.xml to prevent message redistribution and eliminate the "message suckers" as the potential culprit
>   -MySQL backend using the default mysql-persistence-service.xml (from <JBOSS_HOME>/docs/examples/jms)
> Client setup:
>   -both nodes have a client which is a separate process (i.e. not inside JBoss)
>   -clients are Spring based
>   -one client produces and consumes, the other client just consumes
>   -both clients use the ClusteredXAConnectionFactory from the default connection-factories-service.xml
>   -both clients publish to and consume from "queue/testDistributedQueue"
>   -clients are configured to send persistent messages, use AUTO_ACKNOWLEDGE, and transacted sessions
> Symptoms of the issue:
>   -when running the clients I watch the JMX-Console for the "queue/testDistributedQueue"
>   -as the consumers pull messages off the queue I can see the MessageCount and DeliveringCount go to 0 every so often
>   -after a period of time (usually a few hours) the MessageCount and DeliveringCount never go back to 0
>   -I "kill" the clients and wait for the DeliveringCount to go to 0, but it never does
>   -after the clients are killed the ConsumerCount for the queue will drop, but never to 0 when messages are "stuck"
>   -a thread dump reveals at least one JBM server session that is apparently stuck (it never goes away) - ostensibly this is the consumer that is showing in the JMX-Console for "queue/testDistributedQueue"
>   -a "killall -3 java" doesn't produce anything from the clients so I know their dead
>   -nothing is in any DLQ or expiry queue
>   -the database contains as many rows in the JBM_MSG and JBM_MSG_REF tables as the DeliveringCount in the JMX-Console
>   -rebooting the node with the stuck messages frees the messages to be consumed (i.e. un-sticks them)
> Other notes:
>   -nothing else is happening on either node but running the client and running JBoss
>   -this only appears to happen when a clustered connection factory is used.  I tested using a normal connection factory and after 24 hours couldn't reproduce a stuck message.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list