[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1842) Live node dropping out of cluster can cause duplicate message delivery

Thu Jan 20 00:06:49 EST 2011

    [ https://issues.jboss.org/browse/JBMESSAGING-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576685#comment-12576685 ] 

Yong Hao Gao commented on JBMESSAGING-1842:
-------------------------------------------

--- The Test Plan ---

I - Verify the old failover works ( keepOldFailoverModel = true )

    All existing tests should pass. All example works.

    JBM_CLUSTER table can be created, but no actions against it (timestamp updates and state changes).

II - New failover model (keepOldFailoverModel = false)

0. verify All examples

1. Timestamp update

• timestamp refresh interval: timestamp should be updated after each interval.

2. Normal Failover

• Node shutdown, crash should cause failover to happen (if failoverOnNodeLeave is true)
• Multiple nodes shutdown and crash

3. Node leaving cluster alive (Using JGroups util to simulate)

• one node leaves the cluster: failover should not happen. the node serves as standalone server. the cluster quarantined it as expected.
• if the node later shutdown/crash: failover should happen (if failoverOnNodeLeave is true)
• if the node later re-joins the cluster: its quarantined state will be back to clustered.

Repeat the above 3 steps with multiple node leaving, shutdown/crashing and rejoining.

All message producers and consumers in the above test should work properly. The following should be checked:

• if a cluster has some nodes (>1) in quarantined state, will a client side failover be limited to those normal nodes? or the client may failover to /from a quarantined node? What happens if the quarantined node rejoins the cluster? or what if it crashed/shutdown?

• if client side failover works consistently in a dynamic situation where nodes consistently leave, rejoin, shutdown and crash?

> Live node dropping out of cluster can cause duplicate message delivery
> ----------------------------------------------------------------------
>
>                 Key: JBMESSAGING-1842
>                 URL: https://issues.jboss.org/browse/JBMESSAGING-1842
>             Project: JBoss Messaging
>          Issue Type: Bug
>          Components: JMS Clustering
>    Affects Versions: 1.4.0.SP3.CP10
>            Reporter: Justin Bertram
>            Assignee: Yong Hao Gao
>
> When a live node is kicked out of the cluster (for whatever reason) its JBoss Messaging ServerPeer remains active which means the node is still available to send messages to clients.  However, when the node is kicked out of the cluster another node in the cluster performs fail-over for that node and takes ownership of that node's messages in the database.  The "dead" node may know nothing about this and might believe it still owns those messages and therefore will deliver those messages to clients.  After delivery it tries to remove the message from the database and can't (because it doesn't actually own that message anymore).  When this happens the "dead" node issues a WARN like this:
>   WARN  [JDBCPersistenceManager] Failed to remove row for: Reference[23318958991900672]:RELIABLE
> Of course, the node which performed the fail-over and actually owns the message now may also deliver the message to a client.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira