[jboss-jira] [JBoss JIRA] Updated: (JBMESSAGING-1881) Duplicated messages in the event of failover node changed

Fri Jun 17 05:47:23 EDT 2011

     [ https://issues.jboss.org/browse/JBMESSAGING-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yong Hao Gao updated JBMESSAGING-1881:
--------------------------------------

        Fix Version/s: 1.4.0.SP3.CP13
                       1.4.8.SP1
    Affects Version/s: 1.4.8.GA
                       1.4.0.SP3.CP12
          Component/s: JMS Clustering
                       Messaging Core Distributed Support


> Duplicated messages in the event of failover node changed 
> ----------------------------------------------------------
>
>                 Key: JBMESSAGING-1881
>                 URL: https://issues.jboss.org/browse/JBMESSAGING-1881
>             Project: JBoss Messaging
>          Issue Type: Bug
>          Components: JMS Clustering, Messaging Core Distributed Support
>    Affects Versions: 1.4.0.SP3.CP12, 1.4.8.GA
>            Reporter: Yong Hao Gao
>            Assignee: Yong Hao Gao
>            Priority: Critical
>             Fix For: 1.4.0.SP3.CP13, 1.4.8.SP1
>
>
> A message M can be duplicated if the following condition happens in a clustered environment:
> M is sent to node A and there is a consumer connected to it. Node A delivers M to consumer. It needs to replicates M to its failover node B before doing the delivering. What it does is send a replicate request to node B, then put M in a list. When it gets the response (async call) it will take M from the list and perform delivery.
> If at the time between the replication and the delivering node B crashes, node A will get notified and tries to deliver those replicated but not delivered messages. This also take messages from the list said above. 
> Due to a coding flaw, this two concurrent access to the list may result in duplicated message. The first delivery goes through the following pieces of code:
> (in ServerSessionEndpoint.replicateDeliveryResponseReceived(long deliveryID))
> [code 1]
> ...
> while (true)
> {
>    DeliveryRecord dr = (DeliveryRecord)toDeliver.peek();
>    ...
>    if (performDelivery)
>    {
>       toDeliver.take();
>       performDelivery(dr.del.getReference(), dr.deliveryID, dr.getConsumer());
>       ...
> ...
> and the second delivery goes through this piece of code:
> (in ServerSessionEndpoint.deliverAnyWaitingDeliveries(String queueName))
> [code 2]
> DeliveryRecord dr = (DeliveryRecord)toDeliver.poll(0);
> synchronized (dr)
> {
>    performDelivery(dr.del.getReference(), dr.deliveryID, dr.getConsumer());
> ...
> toDeliver is a LinkedQueue and is proper for concurrent access. 
> However, in [code 1] the peek() method doesn't remove the element in the queue, this gives the poll(0) in [code 2] a chance to get and remove the element in the queue and deliver it.
> Even in [code 1] later the take() won't get the element, but it doesn't matter already, because the next use the peeked one to perform delivery.
> The fix should be simple, just check the return value of take() in code 1 and if null, don't deliver it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira