[infinispan-dev] Shared vs Non-Shared CacheStores

Fri Jul 17 07:48:44 EDT 2015

+1000

On 16/07/15 15:32, Sanne Grinovero wrote:
> I would like to propose a clear cut separation between our shared and
> non-shared CacheStores,
> in all terms such as:
>   - Configuration options
>   - Integration contracts (Split the CacheStore SPI)
>   - Implementations
>   - Terminology, to avoid any further confusion around valid
> configurations and sensible architectures
>
> We have loads of examples of users who get in trouble by configuring
> one incorrectly, but also there are plenty of efficiency improvements
> we could take advantage of by clearly splitting the integration points
> and the implementations in two categories.
>
> Not least, it's a very common and dangerous pitfall to assume that
> Infinispan is able to restore a consistent state after having stopped
> a DIST cluster which passivated into non-shared CacheStore instances,
> or even REPL clusters when they don't shutdown all at the same exact
> time (and "exact same time" is a strange concept at least..). We need
> to clarify the different options, tradeoffs and their consequences..
> to users and ourselves, as a clearly defined use case will avoid bugs
> and simplify implementations.
>
> # The purpose of each
> I think that people should use a non-shared (local?) CacheStore for
> the sole purpose of expanding to storage capacity of each single
> node.. be it because you don't have enough memory at all, or be it
> because you prefer some extra safety margin because either your
> estimates are complex, or maybe because we live in a real world were
> the hashing function might not be perfect in practice. I hope we all
> agree that Infinispan should be able to take such situations with at
> worst a graceful performance degradatation, rather than complain
> sending OOMs to the admin and setting the service on strike.
>
> A Shared CacheStore is useful for very different purposes; primarily
> to implement a Cache on some other service - for example your (single,
> shared) RDBMs, a slow (or expensive) webservice your organization has
> to call frequently, etc.. Or it's useful even as a write-through cache
> on a similar service, maybe internal but not able to handle the high
> variation of load spikes which Infinsipan can handle better.
> Finally, a great use case is to have a consistent backup of all your
> data-grid content, possibly in some "reference" form such as JPA
> mapped entities.
>
> # Benefits of a Non-Shared
> A non-shared CacheStore implementor should be able to take advantage
> of *its purpose*, among the big ones I see:
>   - Exclusive usage -> locking of a specific entry can be handled at
> datacontainer level, can simplify quite some internal code.
>   - Reliability -> since a clustered node needs to wipe its state at
> reboot (after a crash), it's much simpler to code any such CacheStore
> to avoid any form of disk synch or persistance guarantees.
>   - Encoding format -> this can be controlled entirely by Infinispan,
> and no need to take factors like rolling upgrade compatible encodings
> in mind. JBoss Marshalling would be good enough, or some
> implementations might not need to serialize at all.
>
> Our non-shared CacheStore implentation(s) could take advantage of
> lower level more complex code optimisations and interfaces, as users
> would rarely want to customize one of these, while the use case of
> mapping data to a shared service needs a more user friendly SPI so to
> keep it simple to plug in custom stores: custom data formats, custom
> connectors, get some help in implementing concurrency correctly.
> Proper Transaction integration for the CacheStore has been on our
> wishlist for some time too, I suspect that accepting that we have been
> mixing up two different things under a same name so far, would make it
> simpler to implement further improvements such as transactions: the
> way to do such a thing is very different in each of these use cases,
> so it would help at least to implement it on a subset first, or maybe
> only if it turns out there's no need for such things in the context of
> the local-only-dedicated "swapfile".
>
> # Mixed types should be killed
> I'm aware that some of our current implementations _could_ work both as
> shared or non-shared, for example the JDBC or JPACacheStore or the
> Remote Cachestore.. but in most cases it doesn't make much sense. Why
> would you ever want to use the JPACacheStore if not to share data with
> a _shared_ database?
>
> We should take such options away, and by doing so focus on the use
> cases which actually matter and simplify the implementations and
> improve the configuration validations.
>
> If ever a compelling storage technology is identified which we'd like to
> offer as an option for both shared or non-shared, I would still
> recommend to make two different implementations, as there certainly are
> different requirements and assumptions when coding such a thing.
>
> Not least, I would very like to see a default local CacheStore:
> picking one for local "emergency swapping" should be a no-brainer for
> users; we could setup one by default and not bother newcomers with
> complex choices.
>
> If we simplify the requirement of such a thing, it should be easy to
> write one on standard Java NIO2 APIs and get rid of the complexities of
> maintaining the native integration with things like LevelDB, not least
> the inefficiency of Java to make such native calls.
>
> Then as a second step, we should attack the other use case: backups;
> from a *purpose driven perspective* I'd then see us revive the Cassandra
> integration; obviously as a shared-only option.
>
> Cheers,
> Sanne
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev