[
https://issues.jboss.org/browse/ISPN-932?page=com.atlassian.jira.plugin.s...
]
Manik Surtani commented on ISPN-932:
------------------------------------
I've implemented a join_abort phase which removes a node from the topology if the join
process ends abnormally. @Shane, would be good if you can test with this and see if it
helps you.
Failed nodes remain in the topology.
------------------------------------
Key: ISPN-932
URL:
https://issues.jboss.org/browse/ISPN-932
Project: Infinispan
Issue Type: Bug
Components: Distributed Cache
Reporter: Shane Johnson
Assignee: Manik Surtani
Fix For: 4.2.1.FINAL
A node will remain in the cluster topology even if it never enters the RUNNING state.
1. CacheDelegate.start
2. ComponentRegistry.start
3. AbstractComponentRegistry.start
4. AbstractComponentRegistry.internalStart
5. AbstractComponentRegistry.handleLifecycleTransitionFailure
The last start method will execute the @Start methods of the components. In the event
that one of the methods throws an exception, the node will enter the FAILED state.
The problem is that in distributed mode the node is added to the cluster topology before
the rehashing takes place. If an exception is thrown during the rehash, the join still
completes successfully.
1. Broadcast new consistent hash.
2. Get state.
3. Invalidate state. (This is in a finally block. Occurs even if get state fails.)
4. Complete join. (This is in a finally block. Occurs even if get state/invalidation
fail.)
There needs to be a way to remove a node from the topology if it enters the FAILED state.
Or, perhaps wait to add it to the topology until it enters the RUNNING state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see:
http://www.atlassian.com/software/jira