[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1822) MessageSucker failures cause the delivery of the failed message to stall

Monday, 25 October 2010

    [
https://jira.jboss.org/browse/JBMESSAGING-1822?page=com.atlassian.jira.pl...
] 

Yong Hao Gao commented on JBMESSAGING-1822:
-------------------------------------------

As XA is not a preferred way to this issue, I come up with an idea that can hopefully
solve the issue. I give it here for discussion.

1. Introduce a new state in JBM_MSG_REF table's STATE column. The new state
'S' marks that the message is in a special "To be sucked" state.
2. Change the sucking process as the following:

a) When a Message M is ready to be sucked, it is put to the remote consumer
(ServerConsumerEndpoint) for delivery. This remote consumer on accepting M, update the
M's state to 'S'. Then it goes on to actually deliver M to the Message Sucker
on another node.

b) When the message sucker receives M, it first acknowledges it and then sends it to the
local queue.

c) When the acknowledgement arrives at the source node, the session simply forget the
message (without any DB operations).

Failure handling

The sucking process will have to handle the following failures

1. Source node crash. That will leave M in either 'C' (normal state) or
'S' state. If M is in normal state. When the node is restarted, it will be
delivered as normal. If M is already in 'S' state, this message won't be
delivered as normal messages when the node is restarted, no matter whether this M has been
delivered to the Sucker or not. At the target node, if the sucker has received the message
and updated it successfully, M will be in the local queue already. If the sucker has
received the message but failed to update M, leaving M in 'S' state. 

When source node comes up again, the sucker will reconnected to the source node and
registers a remote consumer. On starting up of the remoting consumer, it will first check
in DB is there is a message that has state of 'S', if so it picks up this message
and deliver it to the sucker right away.

2. Target node crash. That will leave M in several situations:

1st - M being put to the remote consumer be hasn't been updated to 'S' state.
Then M will eventually be cancelled to queue for re-delivery.
2nd - M being put to the remote consumer and updated to 'S' state, but not yet
delivered. When remote consumer be closed eventually, it will acknowlege it to the Session
to forget it. That prevent the session from redeliver M when it is in 'S' state.
When target node comes up and a new remote consumer registered, the message will be picked
up and delivered (the first thing such a remote consumer does when started).
3rd - M being put to the remote consumer and updated to 'S' state and delivered to
the sucker in target node, but failed to acknowledge it. This is equivalent to 2nd case,
i.e. when target node is back the message M will be redelivered.
4th - M being delivered to target node and acked, but failed to update its state (still in
'S'). Same thing as 3rd, i.e. the M will be picked up when targe node is back and
a remote consumer is registered.

3. Both nodes crash. We just monitor the state of M. When either node starts up again, it
ignores messages marked as 'S' states. Once a sucker is created and registers a
remote consumer, it's first task is to look up the message in 'S' state and
deliver it if any. 

...
 MessageSucker failures cause the delivery of the failed message to
stall
 ------------------------------------------------------------------------

                 Key: JBMESSAGING-1822
                 URL: https://jira.jboss.org/browse/JBMESSAGING-1822
             Project: JBoss Messaging
          Issue Type: Bug
          Components: Messaging Core
    Affects Versions: 1.4.6.GA
            Reporter: david.boeren
            Assignee: Yong Hao Gao
             Fix For: Unscheduled

         Attachments: helloworld.zip

 The MessageSucker is responsible for migrating messages between different members of a
cluster, it is a consumer to the remote queue from which it receives messages destined for
the queue on the local cluster member. 
 The onMessage routine, at its most basic, does the following 
 - bookkeeping for the incoming message, including expiry 
 - acknowledge the incoming message 
 - attempt to deliver to the local queue 
 When the delivery fails, the result is the *appearance* of lost messages. Those messages
which are processed during the failure are not redelivered, but they still exist in the
database. 
 The only way I have found to trigger the redelivery of those messages is to redeploy the
queue containing the messages and/or restart that app server. Obviously neither approach
is acceptable. 
 In order to trigger the error I created a SOA cluster which *only* shared the JMS
database, and no other. I modified the helloworld quickstart to display a counter of
messages consumed, clustered the *esb* queue, and then used byteman to trigger the faults.

 The byteman rule is as follows, the quickstart will be attached. 
 RULE throw every fifth send 
 INTERFACE ProducerDelegate 
 METHOD send 
 AT ENTRY 
 IF callerEquals("MessageSucker.onMessage", true) &&
(incrementCounter("throwException") % 5 == 0) 
 DO THROW new IllegalStateException("Deliberate exception") 
 ENDRULE 
 This results in an exception being thrown for every fifth message. Once the delivery has
quiesced, examine the JBM_MSG and JBM_MSG_REF tables to see the messages which have not
been delivered. 
 The clusters are ports-default and ports-01, the client seeds the gateway by sending 300
messages to the default. 
 Adding up the counter from each server *plus* the message count from JBM_MSG results in
300 (or multiples thereof for more executions). 
-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1822) MessageSucker failures cause the delivery of the failed message to stall