[jboss-jira] [JBoss JIRA] Created: (JBMESSAGING-1881) Duplicated messages in the event of failover node changed
Yong Hao Gao (JIRA)
jira-events at lists.jboss.org
Fri Jun 17 05:45:24 EDT 2011
Duplicated messages in the event of failover node changed
----------------------------------------------------------
Key: JBMESSAGING-1881
URL: https://issues.jboss.org/browse/JBMESSAGING-1881
Project: JBoss Messaging
Issue Type: Bug
Reporter: Yong Hao Gao
Assignee: Yong Hao Gao
Priority: Critical
A message M can be duplicated if the following condition happens in a clustered environment:
M is sent to node A and there is a consumer connected to it. Node A delivers M to consumer. It needs to replicates M to its failover node B before doing the delivering. What it does is send a replicate request to node B, then put M in a list. When it gets the response (async call) it will take M from the list and perform delivery.
If at the time between the replication and the delivering node B crashes, node A will get notified and tries to deliver those replicated but not delivered messages. This also take messages from the list said above.
Due to a coding flaw, this two concurrent access to the list may result in duplicated message. The first delivery goes through the following pieces of code:
(in ServerSessionEndpoint.replicateDeliveryResponseReceived(long deliveryID))
[code 1]
...
while (true)
{
DeliveryRecord dr = (DeliveryRecord)toDeliver.peek();
...
if (performDelivery)
{
toDeliver.take();
performDelivery(dr.del.getReference(), dr.deliveryID, dr.getConsumer());
...
...
and the second delivery goes through this piece of code:
(in ServerSessionEndpoint.deliverAnyWaitingDeliveries(String queueName))
[code 2]
DeliveryRecord dr = (DeliveryRecord)toDeliver.poll(0);
synchronized (dr)
{
performDelivery(dr.del.getReference(), dr.deliveryID, dr.getConsumer());
...
toDeliver is a LinkedQueue and is proper for concurrent access.
However, in [code 1] the peek() method doesn't remove the element in the queue, this gives the poll(0) in [code 2] a chance to get and remove the element in the queue and deliver it.
Even in [code 1] later the take() won't get the element, but it doesn't matter already, because the next use the peeked one to perform delivery.
The fix should be simple, just check the return value of take() in code 1 and if null, don't deliver it.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list