[infinispan-dev] Shared vs Non-Shared CacheStores

Sanne Grinovero sanne at infinispan.org
Mon Jul 20 05:41:31 EDT 2015


+1 for incremental changes..

I'd see the first step as defining two different interfaces;
essentially we need to choose two good names.

Then we could have both interfaces still implement the same identical
methods, but go through each implementation and decide to "mark" it as
shared-only or never-shared.

That would make it simpler to make concrete change proposals on each
of them and start taking some advantage from the split. I think you'll
need the two different interfaces to implement the validations you
mentioned.

For Infinispan 8's goals, I'd be happy enough to keep the
"shared-only" interface quite similar to the current one, but mark the
never-shared one as a private or experimental SPI to allow ourselves
some more flexibility in performance oriented changes.

Thanks,
Sanne

On 20 July 2015 at 10:07, Tristan Tarrant <ttarrant at redhat.com> wrote:
> Sanne, well written.
> Before actually implementing any of the optimizations/changes you
> mention, I think the lowest-hanging fruit we should grab now is just to
> add checks to all of our cachestores to actually throw an exception when
> they are being enabled in unsupported configurations.
>
> I've created [1] to get us started
>
> Tristan
>
> [1] https://issues.jboss.org/browse/ISPN-5617
>
> On 16/07/2015 15:32, Sanne Grinovero wrote:
>> I would like to propose a clear cut separation between our shared and
>> non-shared CacheStores,
>> in all terms such as:
>>   - Configuration options
>>   - Integration contracts (Split the CacheStore SPI)
>>   - Implementations
>>   - Terminology, to avoid any further confusion around valid
>> configurations and sensible architectures
>>
>> We have loads of examples of users who get in trouble by configuring
>> one incorrectly, but also there are plenty of efficiency improvements
>> we could take advantage of by clearly splitting the integration points
>> and the implementations in two categories.
>>
>> Not least, it's a very common and dangerous pitfall to assume that
>> Infinispan is able to restore a consistent state after having stopped
>> a DIST cluster which passivated into non-shared CacheStore instances,
>> or even REPL clusters when they don't shutdown all at the same exact
>> time (and "exact same time" is a strange concept at least..). We need
>> to clarify the different options, tradeoffs and their consequences..
>> to users and ourselves, as a clearly defined use case will avoid bugs
>> and simplify implementations.
>>
>> # The purpose of each
>> I think that people should use a non-shared (local?) CacheStore for
>> the sole purpose of expanding to storage capacity of each single
>> node.. be it because you don't have enough memory at all, or be it
>> because you prefer some extra safety margin because either your
>> estimates are complex, or maybe because we live in a real world were
>> the hashing function might not be perfect in practice. I hope we all
>> agree that Infinispan should be able to take such situations with at
>> worst a graceful performance degradatation, rather than complain
>> sending OOMs to the admin and setting the service on strike.
>>
>> A Shared CacheStore is useful for very different purposes; primarily
>> to implement a Cache on some other service - for example your (single,
>> shared) RDBMs, a slow (or expensive) webservice your organization has
>> to call frequently, etc.. Or it's useful even as a write-through cache
>> on a similar service, maybe internal but not able to handle the high
>> variation of load spikes which Infinsipan can handle better.
>> Finally, a great use case is to have a consistent backup of all your
>> data-grid content, possibly in some "reference" form such as JPA
>> mapped entities.
>>
>> # Benefits of a Non-Shared
>> A non-shared CacheStore implementor should be able to take advantage
>> of *its purpose*, among the big ones I see:
>>   - Exclusive usage -> locking of a specific entry can be handled at
>> datacontainer level, can simplify quite some internal code.
>>   - Reliability -> since a clustered node needs to wipe its state at
>> reboot (after a crash), it's much simpler to code any such CacheStore
>> to avoid any form of disk synch or persistance guarantees.
>>   - Encoding format -> this can be controlled entirely by Infinispan,
>> and no need to take factors like rolling upgrade compatible encodings
>> in mind. JBoss Marshalling would be good enough, or some
>> implementations might not need to serialize at all.
>>
>> Our non-shared CacheStore implentation(s) could take advantage of
>> lower level more complex code optimisations and interfaces, as users
>> would rarely want to customize one of these, while the use case of
>> mapping data to a shared service needs a more user friendly SPI so to
>> keep it simple to plug in custom stores: custom data formats, custom
>> connectors, get some help in implementing concurrency correctly.
>> Proper Transaction integration for the CacheStore has been on our
>> wishlist for some time too, I suspect that accepting that we have been
>> mixing up two different things under a same name so far, would make it
>> simpler to implement further improvements such as transactions: the
>> way to do such a thing is very different in each of these use cases,
>> so it would help at least to implement it on a subset first, or maybe
>> only if it turns out there's no need for such things in the context of
>> the local-only-dedicated "swapfile".
>>
>> # Mixed types should be killed
>> I'm aware that some of our current implementations _could_ work both as
>> shared or non-shared, for example the JDBC or JPACacheStore or the
>> Remote Cachestore.. but in most cases it doesn't make much sense. Why
>> would you ever want to use the JPACacheStore if not to share data with
>> a _shared_ database?
>>
>> We should take such options away, and by doing so focus on the use
>> cases which actually matter and simplify the implementations and
>> improve the configuration validations.
>>
>> If ever a compelling storage technology is identified which we'd like to
>> offer as an option for both shared or non-shared, I would still
>> recommend to make two different implementations, as there certainly are
>> different requirements and assumptions when coding such a thing.
>>
>> Not least, I would very like to see a default local CacheStore:
>> picking one for local "emergency swapping" should be a no-brainer for
>> users; we could setup one by default and not bother newcomers with
>> complex choices.
>>
>> If we simplify the requirement of such a thing, it should be easy to
>> write one on standard Java NIO2 APIs and get rid of the complexities of
>> maintaining the native integration with things like LevelDB, not least
>> the inefficiency of Java to make such native calls.
>>
>> Then as a second step, we should attack the other use case: backups;
>> from a *purpose driven perspective* I'd then see us revive the Cassandra
>> integration; obviously as a shared-only option.
>>
>> Cheers,
>> Sanne
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> --
> Tristan Tarrant
> Infinispan Lead
> JBoss, a division of Red Hat
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list