[infinispan-issues] [JBoss JIRA] Commented: (ISPN-1239) Graceful shutdown should be supported

Wednesday, 20 July 2011

    [
https://issues.jboss.org/browse/ISPN-1239?page=com.atlassian.jira.plugin....
] 

Dan Berindei commented on ISPN-1239:
------------------------------------

The workaround caused some failures in ConcurrentOverlappingLeaveTest:

1. We were clearing the data container before stopping the cache

With the workaround I was clearing all the caches before stopping them so the time
interval between the clear and the stop was bigger and a cache with an empty data
container could have time to participate in a rehash.

I removed the data container clearing stage as it's not really necessary - we only
need to clear the cache between test methods, if we don't stop it completely.

2. The time interval between two nodes leaving got a lot smaller, and so it was more
likely that the second leaver would start to push something for the first leaver's
rehash but never got to send it.

Let's say the initial cluster members are {A, B, C, D}, numOwners = 3, and D and C
leave in quick succession.
With initial owners(k) = {B, C, D}, when D leaves B expects C to push the key to A.
If but C dies before pushing it, on the following rehash B doesn't push the key to A.

The solution is to remember the last CH for which rehashing completed successfully and
base every rehash on the last successful CH.

3. When a rehash was interrupted by another view coming in, we would allow waiting
transactions to do some work before starting the next rehash.

This could lead to a deadlock if the transaction needed to replicate synchronously to
another node that is waiting for us to finish the rehash.

Instead the rehash task should leave the transactions blocked if it was interrupted by
another view, because it knows there is another rehash pending.

...
 Graceful shutdown should be supported
 -------------------------------------

                 Key: ISPN-1239
                 URL: https://issues.jboss.org/browse/ISPN-1239
             Project: Infinispan
          Issue Type: Feature Request
          Components: Distributed Cache
    Affects Versions: 5.0.0.FINAL
            Reporter: Manik Surtani
            Assignee: Dan Berindei
            Priority: Critical
              Labels: clean_shutdown, rehashing
             Fix For: 5.1.0.BETA1, 5.1.0.FINAL

 Currently, killing any node will result in a rehash.  A mechanism for clean shutdown
should also be supported, so that a rehash is *not* triggered.  Useful when the entire
cluster is being intentionally brought down.
 Need to think about how we do this; perhaps a LEAVE message that will prevent nodes
triggering a rehash when a subsequent view change is detected.  This could be done
programmatically via a {{clean}} parameter to {{stop()}}, but we should explore
alternatives here. 
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[infinispan-issues] [JBoss JIRA] Commented: (ISPN-1239) Graceful shutdown should be supported