[jboss-jira] [JBoss JIRA] Updated: (JBMESSAGING-1881) Duplicated messages in the event of failover node changed
Yong Hao Gao (JIRA)
jira-events at lists.jboss.org
Fri Jun 17 05:47:23 EDT 2011
[ https://issues.jboss.org/browse/JBMESSAGING-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yong Hao Gao updated JBMESSAGING-1881:
--------------------------------------
Fix Version/s: 1.4.0.SP3.CP13
1.4.8.SP1
Affects Version/s: 1.4.8.GA
1.4.0.SP3.CP12
Component/s: JMS Clustering
Messaging Core Distributed Support
> Duplicated messages in the event of failover node changed
> ----------------------------------------------------------
>
> Key: JBMESSAGING-1881
> URL: https://issues.jboss.org/browse/JBMESSAGING-1881
> Project: JBoss Messaging
> Issue Type: Bug
> Components: JMS Clustering, Messaging Core Distributed Support
> Affects Versions: 1.4.0.SP3.CP12, 1.4.8.GA
> Reporter: Yong Hao Gao
> Assignee: Yong Hao Gao
> Priority: Critical
> Fix For: 1.4.0.SP3.CP13, 1.4.8.SP1
>
>
> A message M can be duplicated if the following condition happens in a clustered environment:
> M is sent to node A and there is a consumer connected to it. Node A delivers M to consumer. It needs to replicates M to its failover node B before doing the delivering. What it does is send a replicate request to node B, then put M in a list. When it gets the response (async call) it will take M from the list and perform delivery.
> If at the time between the replication and the delivering node B crashes, node A will get notified and tries to deliver those replicated but not delivered messages. This also take messages from the list said above.
> Due to a coding flaw, this two concurrent access to the list may result in duplicated message. The first delivery goes through the following pieces of code:
> (in ServerSessionEndpoint.replicateDeliveryResponseReceived(long deliveryID))
> [code 1]
> ...
> while (true)
> {
> DeliveryRecord dr = (DeliveryRecord)toDeliver.peek();
> ...
> if (performDelivery)
> {
> toDeliver.take();
> performDelivery(dr.del.getReference(), dr.deliveryID, dr.getConsumer());
> ...
> ...
> and the second delivery goes through this piece of code:
> (in ServerSessionEndpoint.deliverAnyWaitingDeliveries(String queueName))
> [code 2]
> DeliveryRecord dr = (DeliveryRecord)toDeliver.poll(0);
> synchronized (dr)
> {
> performDelivery(dr.del.getReference(), dr.deliveryID, dr.getConsumer());
> ...
> toDeliver is a LinkedQueue and is proper for concurrent access.
> However, in [code 1] the peek() method doesn't remove the element in the queue, this gives the poll(0) in [code 2] a chance to get and remove the element in the queue and deliver it.
> Even in [code 1] later the take() won't get the element, but it doesn't matter already, because the next use the peeked one to perform delivery.
> The fix should be simple, just check the return value of take() in code 1 and if null, don't deliver it.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list