[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1842) Live node dropping out of cluster can cause duplicate message delivery

Monday, 17 January 2011

    [
https://issues.jboss.org/browse/JBMESSAGING-1842?page=com.atlassian.jira....
] 

Yong Hao Gao commented on JBMESSAGING-1842:
-------------------------------------------

Just a little update:

I'd like to add a State field to the table. I want to define 4 states of a node:
standalone, clustered, quarantined, and dead.

When JGroups notify nodes left, we just 'quarantine' those left nodes and notify a
'checker' thread (this thread may also in charge of writing timestamp) to monitor
the quarantined nodes.

The rule is: A quarantined node will be monitored by its current failover node (buddy). If
the quarantined node is really dead, the buddy starts the failover process. If before the
quarantined node die the buddy is also get quarantined, both nodes will be monitored by
the new calculated failover node for the later.

The 'quarantine' process including recalculate the failover map and arrange
monitoring, stop or pause sucking to/from the quarantined nodes, and maybe other
inter-node communications.
The 'quarantined' node also have a chance to know that it has been isolated from
the cluster, so it also will stop or pause the sucking to/from other nodes in the cluster,
still to allow it to work as a standalone server.

The state field may also assist in node rejoining the cluster processing.

I haven't thought about the impact on the client side failover/loadbalancing, I feel
the impact shouldn't be big.

This is just a rough thought and I've just start coding to see how it goes. Any
comments welcomed. 

...
 Live node dropping out of cluster can cause duplicate message
delivery
 ----------------------------------------------------------------------

                 Key: JBMESSAGING-1842
                 URL: https://issues.jboss.org/browse/JBMESSAGING-1842
             Project: JBoss Messaging
          Issue Type: Bug
          Components: JMS Clustering
    Affects Versions: 1.4.0.SP3.CP10
            Reporter: Justin Bertram
            Assignee: Yong Hao Gao

 When a live node is kicked out of the cluster (for whatever reason) its JBoss Messaging
ServerPeer remains active which means the node is still available to send messages to
clients.  However, when the node is kicked out of the cluster another node in the cluster
performs fail-over for that node and takes ownership of that node's messages in the
database.  The "dead" node may know nothing about this and might believe it
still owns those messages and therefore will deliver those messages to clients.  After
delivery it tries to remove the message from the database and can't (because it
doesn't actually own that message anymore).  When this happens the "dead"
node issues a WARN like this:
   WARN  [JDBCPersistenceManager] Failed to remove row for:
Reference[23318958991900672]:RELIABLE
 Of course, the node which performed the fail-over and actually owns the message now may
also deliver the message to a client. 
-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1842) Live node dropping out of cluster can cause duplicate message delivery