[
https://issues.jboss.org/browse/JBMESSAGING-1842?page=com.atlassian.jira....
]
Justin Bertram updated JBMESSAGING-1842:
----------------------------------------
Description:
When a live node is kicked out of the cluster (for whatever reason) its JBoss Messaging
ServerPeer remains active which means the node is still available to send messages to
clients. However, when the node is kicked out of the cluster another node in the cluster
performs fail-over for that node and takes ownership of that node's messages in the
database. The "dead" node knows nothing about this and believes it still owns
those messages and therefore will deliver those messages to clients. After delivery it
tries to remove the message from the database and can't (because it doesn't
actually own that message anymore). When this happens the "dead" node issues a
WARN like this:
WARN [JDBCPersistenceManager] Failed to remove row for:
Reference[23318958991900672]:RELIABLE
Of course, the node which performed the fail-over and actually owns the message now may
also deliver the message to client.
was:
When a live node is kicked out of the cluster (for whatever reason) its JBoss Messaging
ServerPeer remains active which means the node is still available to receive messages and,
if necessary, persist them to the database. However, the CHANNEL_ID values used for the
persisted messages will be invalid since the fail-over procedure would have removed the
"dead" node's channels from the JBM_POSTOFFICE and new CHANNEL_ID values
will be used when the "dead" node is finally restarted. This essentially
orphans any persistent messages sent to the "dead" node after it has been kicked
out of the cluster. Recovering the orphaned messages would require manual intervention
with the database, but even then it might not be possible to match the old CHANNEL_ID
values with the new ones.
The simplest way to solve this problem is to add a constraint to the database that forces
the inserted message's CHANNEL_ID to correspond to a CHANNEL_ID from JBM_POSTOFFICE.
Of course, such a constraint will hurt performance a bit, but it could be optional -
customers could trade robustness for speed.
The node would still need to be restarted in order to rejoin the cluster appropriately,
but the constraint would avoid orphaned messages.
Live node dropping out of cluster can cause duplicate message
delivery
----------------------------------------------------------------------
Key: JBMESSAGING-1842
URL:
https://issues.jboss.org/browse/JBMESSAGING-1842
Project: JBoss Messaging
Issue Type: Bug
Components: JMS Clustering
Affects Versions: 1.4.0.SP3.CP10
Reporter: Justin Bertram
Assignee: Yong Hao Gao
When a live node is kicked out of the cluster (for whatever reason) its JBoss Messaging
ServerPeer remains active which means the node is still available to send messages to
clients. However, when the node is kicked out of the cluster another node in the cluster
performs fail-over for that node and takes ownership of that node's messages in the
database. The "dead" node knows nothing about this and believes it still owns
those messages and therefore will deliver those messages to clients. After delivery it
tries to remove the message from the database and can't (because it doesn't
actually own that message anymore). When this happens the "dead" node issues a
WARN like this:
WARN [JDBCPersistenceManager] Failed to remove row for:
Reference[23318958991900672]:RELIABLE
Of course, the node which performed the fail-over and actually owns the message now may
also deliver the message to client.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira