[infinispan-issues] [JBoss JIRA] Created: (ISPN-932) Failed nodes remain in the topology.

Mon Feb 14 11:59:13 EST 2011

Failed nodes remain in the topology.
------------------------------------

                 Key: ISPN-932
                 URL: https://issues.jboss.org/browse/ISPN-932
             Project: Infinispan
          Issue Type: Bug
          Components: Distributed Cache
            Reporter: Shane Johnson
            Assignee: Manik Surtani

A node will remain in the cluster topology even if it never enters the RUNNING state.

1. CacheDelegate.start
2. ComponentRegistry.start
3. AbstractComponentRegistry.start
4. AbstractComponentRegistry.internalStart
5. AbstractComponentRegistry.handleLifecycleTransitionFailure

The last start method will execute the @Start methods of the components. In the event that one of the methods throws an exception, the node will enter the FAILED state.

The problem is that in distributed mode the node is added to the cluster topology before the rehashing takes place. If an exception is thrown during the rehash, the join still completes successfully.

1. Broadcast new consistent hash.
2. Get state.
3. Invalidate state. (This is in a finally block. Occurs even if get state fails.)
4. Complete join. (This is in a finally block. Occurs even if get state/invalidation fail.)

There needs to be a way to remove a node from the topology if it enters the FAILED state. Or, perhaps wait to add it to the topology until it enters the RUNNING state.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira