[jboss-jira] [JBoss JIRA] Updated: (JBMESSAGING-1842) Live node dropping out of cluster can cause duplicate message delivery

Saturday, 15 January 2011

     [
https://issues.jboss.org/browse/JBMESSAGING-1842?page=com.atlassian.jira....
]

Justin Bertram updated JBMESSAGING-1842:
----------------------------------------

    Description: 
When a live node is kicked out of the cluster (for whatever reason) its JBoss Messaging
ServerPeer remains active which means the node is still available to send messages to
clients.  However, when the node is kicked out of the cluster another node in the cluster
performs fail-over for that node and takes ownership of that node's messages in the
database.  The "dead" node knows nothing about this and believes it still owns
those messages and therefore will deliver those messages to clients.  After delivery it
tries to remove the message from the database and can't (because it doesn't
actually own that message anymore).  When this happens the "dead" node issues a
WARN like this:

  WARN  [JDBCPersistenceManager] Failed to remove row for:
Reference[23318958991900672]:RELIABLE

Of course, the node which performed the fail-over and actually owns the message now may
also deliver the message to client.

  was:
When a live node is kicked out of the cluster (for whatever reason) its JBoss Messaging
ServerPeer remains active which means the node is still available to receive messages and,
if necessary, persist them to the database.  However, the CHANNEL_ID values used for the
persisted messages will be invalid since the fail-over procedure would have removed the
"dead" node's channels from the JBM_POSTOFFICE and new CHANNEL_ID values
will be used when the "dead" node is finally restarted.  This essentially
orphans any persistent messages sent to the "dead" node after it has been kicked
out of the cluster.  Recovering the orphaned messages would require manual intervention
with the database, but even then it might not be possible to match the old CHANNEL_ID
values with the new ones.

The simplest way to solve this problem is to add a constraint to the database that forces
the inserted message's CHANNEL_ID to correspond to a CHANNEL_ID from JBM_POSTOFFICE. 
Of course, such a constraint will hurt performance a bit, but it could be optional -
customers could trade robustness for speed.

The node would still need to be restarted in order to rejoin the cluster appropriately,
but the constraint would avoid orphaned messages.

...
 Live node dropping out of cluster can cause duplicate message
delivery
 ----------------------------------------------------------------------

                 Key: JBMESSAGING-1842
                 URL: https://issues.jboss.org/browse/JBMESSAGING-1842
             Project: JBoss Messaging
          Issue Type: Bug
          Components: JMS Clustering
    Affects Versions: 1.4.0.SP3.CP10
            Reporter: Justin Bertram
            Assignee: Yong Hao Gao

 When a live node is kicked out of the cluster (for whatever reason) its JBoss Messaging
ServerPeer remains active which means the node is still available to send messages to
clients.  However, when the node is kicked out of the cluster another node in the cluster
performs fail-over for that node and takes ownership of that node's messages in the
database.  The "dead" node knows nothing about this and believes it still owns
those messages and therefore will deliver those messages to clients.  After delivery it
tries to remove the message from the database and can't (because it doesn't
actually own that message anymore).  When this happens the "dead" node issues a
WARN like this:
   WARN  [JDBCPersistenceManager] Failed to remove row for:
Reference[23318958991900672]:RELIABLE
 Of course, the node which performed the fail-over and actually owns the message now may
also deliver the message to client. 
-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Updated: (JBMESSAGING-1842) Live node dropping out of cluster can cause duplicate message delivery