[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1456) Messages stuck in being-delivered state in cluster

Tuesday, 3 March 2009

    [
https://jira.jboss.org/jira/browse/JBMESSAGING-1456?page=com.atlassian.ji...
] 

Ron Sigal commented on JBMESSAGING-1456:
----------------------------------------

Re: "Is it guaranteed that the server side
ConnectionListener.handleConnectionException() will be called after the lease
timeout?"

Well, yeah, sort of.  I'll elaborate.

The point is that a single instance of org.jboss.remoting.MicroRemoteClientInvoker can
service multiple instances of org.jboss.remoting.Client, if the Clients are all intended
to be connected to the same server, or, more precisely, if they all use the same
InvokerLocator and the same configuration map.  If Client A is the first Client (in a
given JVM) to connect to a given server, and leasing is enabled for that Client, then
Client A will get an instance of MicroRemoteClientInvoker, and that
MicroRemoteClientInvoker will create and start an org.jboss.remoting.LeasePinger.  If
Client B, with the same InvokerLocator and configuration map then comes along, then it
will get the same MicroRemoteClientInvoker and it will be added to the Lease pinged by the
MicroRemoteClientInvoke's LeasePinger.  Then, if Client.disconnect() is called for
Client A, the LeasePinger will remove Client A from its list, but it won't stop
pinging until the last Client it is servicing disconnects.

So, if there are two JBM connections originating in the same JVM and connected to the same
JBM server, they will be sharing a MicroRemoteClientInvoker and a LeasePinger.  Normally,
Client.disconnect() will result in LeasePinger sending a message to the server telling it
to remove that Client from the Lease.  When the Lease is told, it will notify all
listeners that the Client has disconnected.  But if disconnectTimeout is set to 0, that
message won't get sent.  What happens next depends on the *real* state of the
connection.  If the Client shut down because it was notified about a connection failure,
but the failure was transient, then the LeasePinger might be able to continue pinging,
which will keep the Lease alive, and there will be no notifications on the server side. 
On the other hand, if the network really is down, then no pings will get through.  When
the Lease times out, it will notify any listeners about the a lost connection for *each*
of the Clients on the lease, including the one that disconnected with disconnectTimeout ==
0.

...
 Messages stuck in being-delivered state in cluster
 --------------------------------------------------

                 Key: JBMESSAGING-1456
                 URL: https://jira.jboss.org/jira/browse/JBMESSAGING-1456
             Project: JBoss Messaging
          Issue Type: Bug
    Affects Versions: 1.4.0.SP3_CP03
            Reporter: Justin Bertram
            Assignee: Howard Gao
            Priority: Critical
             Fix For: 1.4.0.SP3.CP08, 1.4.3.GA

         Attachments: kill3_thread_dump.txt, thread_dump.txt

 Messages become "stuck" in being-delivered state when clients use a clustered
XA connection factory in a cluster of at least 2 nodes.
 JBoss setup:
   -2 nodes of JBoss EAP 4.3 CP02
   -commented out "ClusterPullConnectionFactory" in messaging-service.xml to
prevent message redistribution and eliminate the "message suckers" as the
potential culprit
   -MySQL backend using the default mysql-persistence-service.xml (from
<JBOSS_HOME>/docs/examples/jms)
 Client setup:
   -both nodes have a client which is a separate process (i.e. not inside JBoss)
   -clients are Spring based
   -one client produces and consumes, the other client just consumes
   -both clients use the ClusteredXAConnectionFactory from the default
connection-factories-service.xml
   -both clients publish to and consume from "queue/testDistributedQueue"
   -clients are configured to send persistent messages, use AUTO_ACKNOWLEDGE, and
transacted sessions
 Symptoms of the issue:
   -when running the clients I watch the JMX-Console for the
"queue/testDistributedQueue"
   -as the consumers pull messages off the queue I can see the MessageCount and
DeliveringCount go to 0 every so often
   -after a period of time (usually a few hours) the MessageCount and DeliveringCount
never go back to 0
   -I "kill" the clients and wait for the DeliveringCount to go to 0, but it
never does
   -after the clients are killed the ConsumerCount for the queue will drop, but never to 0
when messages are "stuck"
   -a thread dump reveals at least one JBM server session that is apparently stuck (it
never goes away) - ostensibly this is the consumer that is showing in the JMX-Console for
"queue/testDistributedQueue"
   -a "killall -3 java" doesn't produce anything from the clients so I know
their dead
   -nothing is in any DLQ or expiry queue
   -the database contains as many rows in the JBM_MSG and JBM_MSG_REF tables as the
DeliveringCount in the JMX-Console
   -rebooting the node with the stuck messages frees the messages to be consumed (i.e.
un-sticks them)
 Other notes:
   -nothing else is happening on either node but running the client and running JBoss
   -this only appears to happen when a clustered connection factory is used.  I tested
using a normal connection factory and after 24 hours couldn't reproduce a stuck
message. 
-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1456) Messages stuck in being-delivered state in cluster