[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1822) MessageSucker failures cause the delivery of the failed message to stall

Tue Oct 26 21:40:54 EDT 2010

    [ https://jira.jboss.org/browse/JBMESSAGING-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559555#action_12559555 ] 

Yong Hao Gao commented on JBMESSAGING-1822:
-------------------------------------------

Hi Clebert,

Don't we already have optimistic lock in place here? The update also looking at the previous values on the field?

-- We do optimization by reducing a normal delivery/sent into one DB update of the message, from source channel to target channel. But that cannot solve the issues we are facing, as the JIRA has described.

In case of a failure during the update, the sucker could just ignore the message.

-- When this issue happens, the message is ignored by the sucker but also forget by the source node. Unless a re-start of the source is performed, this message appears 'stalled' in the queue.

Having the sucker doing an extra update would create a big performance hit IMO, as you would need to commit the row as soon as you set the "S" state. 

-- Agreed. But comparing with XA, this should be less expensive.

> MessageSucker failures cause the delivery of the failed message to stall
> ------------------------------------------------------------------------
>
>                 Key: JBMESSAGING-1822
>                 URL: https://jira.jboss.org/browse/JBMESSAGING-1822
>             Project: JBoss Messaging
>          Issue Type: Bug
>          Components: Messaging Core
>    Affects Versions: 1.4.6.GA
>            Reporter: david.boeren
>            Assignee: Yong Hao Gao
>             Fix For: Unscheduled
>
>         Attachments: helloworld.zip
>
>
> The MessageSucker is responsible for migrating messages between different members of a cluster, it is a consumer to the remote queue from which it receives messages destined for the queue on the local cluster member. 
> The onMessage routine, at its most basic, does the following 
> - bookkeeping for the incoming message, including expiry 
> - acknowledge the incoming message 
> - attempt to deliver to the local queue 
> When the delivery fails, the result is the *appearance* of lost messages. Those messages which are processed during the failure are not redelivered, but they still exist in the database. 
> The only way I have found to trigger the redelivery of those messages is to redeploy the queue containing the messages and/or restart that app server. Obviously neither approach is acceptable. 
> In order to trigger the error I created a SOA cluster which *only* shared the JMS database, and no other. I modified the helloworld quickstart to display a counter of messages consumed, clustered the *esb* queue, and then used byteman to trigger the faults. 
> The byteman rule is as follows, the quickstart will be attached. 
> RULE throw every fifth send 
> INTERFACE ProducerDelegate 
> METHOD send 
> AT ENTRY 
> IF callerEquals("MessageSucker.onMessage", true) && (incrementCounter("throwException") % 5 == 0) 
> DO THROW new IllegalStateException("Deliberate exception") 
> ENDRULE 
> This results in an exception being thrown for every fifth message. Once the delivery has quiesced, examine the JBM_MSG and JBM_MSG_REF tables to see the messages which have not been delivered. 
> The clusters are ports-default and ports-01, the client seeds the gateway by sending 300 messages to the default. 
> Adding up the counter from each server *plus* the message count from JBM_MSG results in 300 (or multiples thereof for more executions).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira