[
https://jira.jboss.org/jira/browse/JBMESSAGING-1456?page=com.atlassian.ji...
]
Howard Gao commented on JBMESSAGING-1456:
-----------------------------------------
Hi Tim,
That's an issue in JBR. Basically what I understand is that the client ping object
(ConnectionValidator) is shared among the remoting clients, and the ConnectionValidator is
associated with a Lease Pinger for the server. Something wrong happens when new Client is
created and ConnectionValidator fails to shutdown the correct Lease Pinger. Here I quote
Ron's comments as I may not understand it in the very much details:
-----------Ron's comment---------
1. Client_A.connect() and Client_B.connect() register Client_A and Client_B with
MicroRemoteClientInvoker_1's LeasePinger_1 and with the associated server side Lease.
2. Client_A.addConnectionListener() creates (or reuses) ConnectionValidator_1.
3. ConnectionValidator_1 sees a connection failure and notifies Client_A's
listener(s), causing a JBM failover. ClientValidator_1 also shuts down
MicroRemoteClientInvoker_1's LeasePinger_1, which causes the server side Leases for
Client_A and Client_B to notify their listeners about a broken connection.
3. Client_B.addConnectionListener() reuses ConnectionValidator_1.
4. Client_C.connect() registers Client_C with a *new* LeasePinger_2 in
MicroRemoteClientInvoker_1.
4. ConnectionValidator_1 detects a broken connection and notifies Client_B's
listener(s) (causing a JBM failover). ConnectionValidator_1 also shuts down
MicroRemoteClientInvoker_1's *new* LeasePinger_2, which results in all of the server
side Leases associated with LeasePinger_2 to notify their listeners about a broken
connection. BUT CLient_B WAS ASSOCIATED WITH LeasePinger_1's LEASE - IT"S NOT IN
LeasePinger_2's LEASE. SO THERE IS NO NOTIFICATION ABOUT Client_B ON THE SERVER
SIDE.
Or something like that.
Messages stuck in being-delivered state in cluster
--------------------------------------------------
Key: JBMESSAGING-1456
URL:
https://jira.jboss.org/jira/browse/JBMESSAGING-1456
Project: JBoss Messaging
Issue Type: Bug
Affects Versions: 1.4.0.SP3_CP03, 1.4.0.SP3.CP07
Reporter: Justin Bertram
Assignee: Howard Gao
Priority: Blocker
Fix For: 1.4.0.SP3.CP08, 1.4.4.GA
Attachments: DeliveringCount.png, kill3_thread_dump.txt, logs-and-config.zip,
MessageStucked.png, RemoveAllMessagesException.png, test-1456-jars.zip, thread_dump.txt
Messages become "stuck" in being-delivered state when clients use a clustered
XA connection factory in a cluster of at least 2 nodes.
JBoss setup:
-2 nodes of JBoss EAP 4.3 CP02
-commented out "ClusterPullConnectionFactory" in messaging-service.xml to
prevent message redistribution and eliminate the "message suckers" as the
potential culprit
-MySQL backend using the default mysql-persistence-service.xml (from
<JBOSS_HOME>/docs/examples/jms)
Client setup:
-both nodes have a client which is a separate process (i.e. not inside JBoss)
-clients are Spring based
-one client produces and consumes, the other client just consumes
-both clients use the ClusteredXAConnectionFactory from the default
connection-factories-service.xml
-both clients publish to and consume from "queue/testDistributedQueue"
-clients are configured to send persistent messages, use AUTO_ACKNOWLEDGE, and
transacted sessions
Symptoms of the issue:
-when running the clients I watch the JMX-Console for the
"queue/testDistributedQueue"
-as the consumers pull messages off the queue I can see the MessageCount and
DeliveringCount go to 0 every so often
-after a period of time (usually a few hours) the MessageCount and DeliveringCount
never go back to 0
-I "kill" the clients and wait for the DeliveringCount to go to 0, but it
never does
-after the clients are killed the ConsumerCount for the queue will drop, but never to 0
when messages are "stuck"
-a thread dump reveals at least one JBM server session that is apparently stuck (it
never goes away) - ostensibly this is the consumer that is showing in the JMX-Console for
"queue/testDistributedQueue"
-a "killall -3 java" doesn't produce anything from the clients so I know
their dead
-nothing is in any DLQ or expiry queue
-the database contains as many rows in the JBM_MSG and JBM_MSG_REF tables as the
DeliveringCount in the JMX-Console
-rebooting the node with the stuck messages frees the messages to be consumed (i.e.
un-sticks them)
Other notes:
-nothing else is happening on either node but running the client and running JBoss
-this only appears to happen when a clustered connection factory is used. I tested
using a normal connection factory and after 24 hours couldn't reproduce a stuck
message.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira