[infinispan-dev] ISPN-1586 and preloading in clustered caches

Wed Sep 26 07:44:38 EDT 2012

On Tue, Sep 25, 2012 at 10:53 AM, Manik Surtani <manik at jboss.org> wrote:

>
> On 24 Sep 2012, at 16:22, Dan Berindei <dan.berindei at gmail.com> wrote:
>
> Hi guys
>
> During the final push for NBST I found a bug with preloading (entries that
> didn't belong on a joiner weren't removed after the initial state
> transfer). I decided to fix it and
> https://issues.jboss.org/browse/ISPN-1586 at the same time, since it was
> a longstanding bug and I had a reasonable idea on what to do. However, I
> missed some implications and I need to fix them - there is at least one
> Query test failing because of my change (SharedCacheLoaderQueryIndexTest).
>
> In 5.1, preloading worked like this:
> 1. Start the CacheLoaderManager, which preloads everything from the cache
> store in memory.
> 2. Start the StateTransferManager, retrieving data from the other cache
> members and overwriting already-preloaded values.
> 3. When the initial state transfer ends, entries not owned by the local
> node are deleted.
>
> The main issue with this, raised in ISPN-1586, is that entries that were
> deleted on the other cache members are "revived" on the joiner when it
> reads the data from the cache store. There is another performance issue,
> because we load a lot of data that we then discard, but that's less
> important.
>
> With the ISPN-1586 fix, preloading should work like this:
> 1. Start the StateTransferManager, receive initial CH.
> 2. If the local node is not the first to start up, fetching state (either
> in-memory or persistent) is enabled and the cache store is non-shared,
> clear it.
> 3. Start the CacheLoaderManager, which preloads the cache store in memory
> - but only if the local node is the first one having started the cache OR
> if the fetching state is disabled.
> 4. Run the initial state transfer, retrieving data from the other cache
> members (if any, and if fetching state is enabled).
>
> This solves ISPN-1586, but it does mean that data from non-shared cache
> stores will be lost on all the nodes except the first that starts up. So if
> the last node to shut down is not the first node to start back up, the
> cluster will lose data.
>
> These are the alternatives I'm considering:
> a) Finish the ISPN-1586 fix and clearly document that non-shared cache
> stores don't guarantee persistence after cluster restart (unless the last
> cache to stop is the first to start back up and shutdown was spaced out to
> allow state transfer to move everything to the last node).
> b) Revert my ISPN-1586 fix and allow "zombie" cache entries on the joiners
> (leaving ISPN-1586 open).
>
>
> Maybe another approach could be:
>
> 1. Start the STM, retrieve initial CH
> 2. If the local node… (as above) … is non-shared, *don't clear it*, but
> mark the node so preloading is *deferred*.
> 3. Start the CLM … skip preload if we mark it as deferred, in step 2.
> 4. Run initial state transfer.  This will write newer versions of entries
> to the cache store if needed.
> 5. Now, if preloading has been deferred in step 2, start a preload, if
> we're configured to do any preloading.
>
> This should give us consistency.
>
>
Nope, this doesn't solve ISPN-1586: if the already-running members have
deleted a key, the deferred preload on the joiner can still load that key
from its cache store. In fact, the preload doesn't even matter here: just
the fact that the key is still in the cache store means that the node can
still return a non-null value for a deleted key.

This is why I added the clear step in my algorithm: to avoid resurrecting
removed keys without receiving any tombstones through state transfer.

>
> I think there may be a third option:
> c) Make preload a JMX operation and allow the user to run a cluster-wide
> preload once all the nodes in the cluster have started up. But this looks a
> little complicated, and it would require either versioning or prohibiting
> external cache writes until the cluster-wide preload is done to ensure
> consistency.
>
>
> I'm not sure how having this as a JMX option helps.  Having versioning,
> etc. solves the problem even with an automatic preload.
>
>
Agree, just having this as an option in JMX doesn't fix anything. But
having it as a manual operation would allow us to assume (and document it
this way) that the admin only exposes the cluster to the clients after
preloading is done - so we'd have no concurrent changes to worry about.

> What do you guys think? Sanne, I'm particularly interested how you think
> option a) would fit with the query module.
>
> Cheers
> Dan
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20120926/204dfd23/attachment.html