[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1456) Messages stuck in being-delivered state in cluster

Tuesday, 26 May 2009

    [
https://jira.jboss.org/jira/browse/JBMESSAGING-1456?page=com.atlassian.ji...
] 

Howard Gao commented on JBMESSAGING-1456:
-----------------------------------------

Hi Tim,

That's an issue in JBR. Basically what I understand is that the client ping object
(ConnectionValidator) is shared among the remoting clients, and the ConnectionValidator is
associated with a Lease Pinger for the server. Something wrong happens when new Client is
created and ConnectionValidator fails to shutdown the correct Lease Pinger. Here I quote
Ron's comments as I may not understand it in the very much details:

-----------Ron's comment---------

1. Client_A.connect() and Client_B.connect() register Client_A and Client_B with
MicroRemoteClientInvoker_1's LeasePinger_1 and with the associated server side Lease.

2. Client_A.addConnectionListener() creates (or reuses) ConnectionValidator_1.

3. ConnectionValidator_1 sees a connection failure and notifies Client_A's
listener(s), causing a JBM failover.  ClientValidator_1 also shuts down
MicroRemoteClientInvoker_1's LeasePinger_1, which causes the server side Leases for
Client_A and Client_B to notify their listeners about a broken connection.

3. Client_B.addConnectionListener() reuses ConnectionValidator_1.

4. Client_C.connect() registers Client_C with a *new* LeasePinger_2 in
MicroRemoteClientInvoker_1.

4. ConnectionValidator_1 detects a broken connection and notifies Client_B's
listener(s) (causing a JBM failover).  ConnectionValidator_1 also shuts down
MicroRemoteClientInvoker_1's *new* LeasePinger_2, which results in all of the server
side Leases associated with LeasePinger_2 to notify their listeners about a broken
connection.  BUT CLient_B WAS ASSOCIATED WITH LeasePinger_1's LEASE - IT"S NOT IN
LeasePinger_2's LEASE.  SO THERE IS NO NOTIFICATION ABOUT Client_B ON THE SERVER
SIDE.

Or something like that.

...
 Messages stuck in being-delivered state in cluster
 --------------------------------------------------

                 Key: JBMESSAGING-1456
                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1456
             Project: JBoss Messaging
          Issue Type: Bug
    Affects Versions: 1.4.0.SP3_CP03, 1.4.0.SP3.CP07
            Reporter: Justin Bertram
            Assignee: Howard Gao
            Priority: Blocker
             Fix For: 1.4.0.SP3.CP08, 1.4.4.GA

         Attachments: DeliveringCount.png, kill3_thread_dump.txt, logs-and-config.zip,
MessageStucked.png, RemoveAllMessagesException.png, test-1456-jars.zip, thread_dump.txt

 Messages become "stuck" in being-delivered state when clients use a clustered
XA connection factory in a cluster of at least 2 nodes.
 JBoss setup:
   -2 nodes of JBoss EAP 4.3 CP02
   -commented out "ClusterPullConnectionFactory" in messaging-service.xml to
prevent message redistribution and eliminate the "message suckers" as the
potential culprit
   -MySQL backend using the default mysql-persistence-service.xml (from
<JBOSS_HOME>/docs/examples/jms)
 Client setup:
   -both nodes have a client which is a separate process (i.e. not inside JBoss)
   -clients are Spring based
   -one client produces and consumes, the other client just consumes
   -both clients use the ClusteredXAConnectionFactory from the default
connection-factories-service.xml
   -both clients publish to and consume from "queue/testDistributedQueue"
   -clients are configured to send persistent messages, use AUTO_ACKNOWLEDGE, and
transacted sessions
 Symptoms of the issue:
   -when running the clients I watch the JMX-Console for the
"queue/testDistributedQueue"
   -as the consumers pull messages off the queue I can see the MessageCount and
DeliveringCount go to 0 every so often
   -after a period of time (usually a few hours) the MessageCount and DeliveringCount
never go back to 0
   -I "kill" the clients and wait for the DeliveringCount to go to 0, but it
never does
   -after the clients are killed the ConsumerCount for the queue will drop, but never to 0
when messages are "stuck"
   -a thread dump reveals at least one JBM server session that is apparently stuck (it
never goes away) - ostensibly this is the consumer that is showing in the JMX-Console for
"queue/testDistributedQueue"
   -a "killall -3 java" doesn't produce anything from the clients so I know
their dead
   -nothing is in any DLQ or expiry queue
   -the database contains as many rows in the JBM_MSG and JBM_MSG_REF tables as the
DeliveringCount in the JMX-Console
   -rebooting the node with the stuck messages frees the messages to be consumed (i.e.
un-sticks them)
 Other notes:
   -nothing else is happening on either node but running the client and running JBoss
   -this only appears to happen when a clustered connection factory is used.  I tested
using a normal connection factory and after 24 hours couldn't reproduce a stuck
message. 
-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1456) Messages stuck in being-delivered state in cluster