[infinispan-issues] [JBoss JIRA] Commented: (ISPN-1239) Graceful shutdown should be supported

Wed Jul 20 04:10:24 EDT 2011

    [ https://issues.jboss.org/browse/ISPN-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615346#comment-12615346 ] 

Dan Berindei commented on ISPN-1239:
------------------------------------

The workaround caused some failures in ConcurrentOverlappingLeaveTest:

1. We were clearing the data container before stopping the cache

With the workaround I was clearing all the caches before stopping them so the time interval between the clear and the stop was bigger and a cache with an empty data container could have time to participate in a rehash.

I removed the data container clearing stage as it's not really necessary - we only need to clear the cache between test methods, if we don't stop it completely.

2. The time interval between two nodes leaving got a lot smaller, and so it was more likely that the second leaver would start to push something for the first leaver's rehash but never got to send it.

Let's say the initial cluster members are {A, B, C, D}, numOwners = 3, and D and C leave in quick succession.
With initial owners(k) = {B, C, D}, when D leaves B expects C to push the key to A.
If but C dies before pushing it, on the following rehash B doesn't push the key to A.

The solution is to remember the last CH for which rehashing completed successfully and base every rehash on the last successful CH.

3. When a rehash was interrupted by another view coming in, we would allow waiting transactions to do some work before starting the next rehash.

This could lead to a deadlock if the transaction needed to replicate synchronously to another node that is waiting for us to finish the rehash.

Instead the rehash task should leave the transactions blocked if it was interrupted by another view, because it knows there is another rehash pending.

> Graceful shutdown should be supported
> -------------------------------------
>
>                 Key: ISPN-1239
>                 URL: https://issues.jboss.org/browse/ISPN-1239
>             Project: Infinispan
>          Issue Type: Feature Request
>          Components: Distributed Cache
>    Affects Versions: 5.0.0.FINAL
>            Reporter: Manik Surtani
>            Assignee: Dan Berindei
>            Priority: Critical
>              Labels: clean_shutdown, rehashing
>             Fix For: 5.1.0.BETA1, 5.1.0.FINAL
>
>
> Currently, killing any node will result in a rehash.  A mechanism for clean shutdown should also be supported, so that a rehash is *not* triggered.  Useful when the entire cluster is being intentionally brought down.
> Need to think about how we do this; perhaps a LEAVE message that will prevent nodes triggering a rehash when a subsequent view change is detected.  This could be done programmatically via a {{clean}} parameter to {{stop()}}, but we should explore alternatives here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira