[
https://issues.jboss.org/browse/JBMESSAGING-1842?page=com.atlassian.jira....
]
Yong Hao Gao commented on JBMESSAGING-1842:
-------------------------------------------
--- The Test Plan ---
I - Verify the old failover works ( keepOldFailoverModel = true )
All existing tests should pass. All example works.
JBM_CLUSTER table can be created, but no actions against it (timestamp updates and
state changes).
II - New failover model (keepOldFailoverModel = false)
0. verify All examples
1. Timestamp update
• timestamp refresh interval: timestamp should be updated after each interval.
2. Normal Failover
• Node shutdown, crash should cause failover to happen (if failoverOnNodeLeave is true)
• Multiple nodes shutdown and crash
3. Node leaving cluster alive (Using JGroups util to simulate)
• one node leaves the cluster: failover should not happen. the node serves as standalone
server. the cluster quarantined it as expected.
• if the node later shutdown/crash: failover should happen (if failoverOnNodeLeave is
true)
• if the node later re-joins the cluster: its quarantined state will be back to
clustered.
Repeat the above 3 steps with multiple node leaving, shutdown/crashing and rejoining.
All message producers and consumers in the above test should work properly. The following
should be checked:
• if a cluster has some nodes (>1) in quarantined state, will a client side failover be
limited to those normal nodes? or the client may failover to /from a quarantined node?
What happens if the quarantined node rejoins the cluster? or what if it crashed/shutdown?
• if client side failover works consistently in a dynamic situation where nodes
consistently leave, rejoin, shutdown and crash?
Live node dropping out of cluster can cause duplicate message
delivery
----------------------------------------------------------------------
Key: JBMESSAGING-1842
URL:
https://issues.jboss.org/browse/JBMESSAGING-1842
Project: JBoss Messaging
Issue Type: Bug
Components: JMS Clustering
Affects Versions: 1.4.0.SP3.CP10
Reporter: Justin Bertram
Assignee: Yong Hao Gao
When a live node is kicked out of the cluster (for whatever reason) its JBoss Messaging
ServerPeer remains active which means the node is still available to send messages to
clients. However, when the node is kicked out of the cluster another node in the cluster
performs fail-over for that node and takes ownership of that node's messages in the
database. The "dead" node may know nothing about this and might believe it
still owns those messages and therefore will deliver those messages to clients. After
delivery it tries to remove the message from the database and can't (because it
doesn't actually own that message anymore). When this happens the "dead"
node issues a WARN like this:
WARN [JDBCPersistenceManager] Failed to remove row for:
Reference[23318958991900672]:RELIABLE
Of course, the node which performed the fail-over and actually owns the message now may
also deliver the message to a client.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira