[infinispan-dev] Shared vs Non-Shared CacheStores

Dan Berindei dan.berindei at gmail.com
Wed Aug 5 17:13:22 EDT 2015


On Fri, Jul 31, 2015 at 3:30 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
> On 20 July 2015 at 11:02, Dan Berindei <dan.berindei at gmail.com> wrote:
>> Sanne, I think changing the cache store API is actually the most
>> painful part, so we should only do it if we gain a concrete advantage
>> from doing it. From a compatibility point of view, implementing a new
>> interface vs implementing the same interface with completely different
>> methods is just as bad.
>
> Right, from that perspective it's a quite horrible proposal.
>
> But I think we can agree that only the "SharedCacheStore" deserves to
> be considered an SPI, right?
> That's the one people will normally customize to map stuff to other
> stores one might have.
>
> I think it's important that beyond Infinispan 8.0 API's freeze, we can
> make any change to the non-shared SPI
> without affecting users who implement a custom shared cachestore.
>
> I highly doubt someone will implement a high-performance custom off
> heap swap strategy, but if someone does he should contribute it and
> will probably need to make integration level changes.
>
> We probably won't have the time to implement a new super efficient
> local-only cachestore to replace the leveldb one, but I'd like to keep
> the possibility open to do that beyond 8.0, *especially* without
> breaking compatibility for other people.

We already have a new super efficient local-only cachestore :)

https://github.com/infinispan/infinispan/tree/master/persistence/soft-index


>
> Sanne
>
>
>>
>> On Mon, Jul 20, 2015 at 12:41 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>> +1 for incremental changes..
>>>
>>> I'd see the first step as defining two different interfaces;
>>> essentially we need to choose two good names.
>>>
>>> Then we could have both interfaces still implement the same identical
>>> methods, but go through each implementation and decide to "mark" it as
>>> shared-only or never-shared.
>>>
>>> That would make it simpler to make concrete change proposals on each
>>> of them and start taking some advantage from the split. I think you'll
>>> need the two different interfaces to implement the validations you
>>> mentioned.
>>>
>>> For Infinispan 8's goals, I'd be happy enough to keep the
>>> "shared-only" interface quite similar to the current one, but mark the
>>> never-shared one as a private or experimental SPI to allow ourselves
>>> some more flexibility in performance oriented changes.
>>>
>>> Thanks,
>>> Sanne
>>>
>>> On 20 July 2015 at 10:07, Tristan Tarrant <ttarrant at redhat.com> wrote:
>>>> Sanne, well written.
>>>> Before actually implementing any of the optimizations/changes you
>>>> mention, I think the lowest-hanging fruit we should grab now is just to
>>>> add checks to all of our cachestores to actually throw an exception when
>>>> they are being enabled in unsupported configurations.
>>>>
>>>> I've created [1] to get us started
>>>>
>>>> Tristan
>>>>
>>>> [1] https://issues.jboss.org/browse/ISPN-5617
>>>>
>>>> On 16/07/2015 15:32, Sanne Grinovero wrote:
>>>>> I would like to propose a clear cut separation between our shared and
>>>>> non-shared CacheStores,
>>>>> in all terms such as:
>>>>>   - Configuration options
>>>>>   - Integration contracts (Split the CacheStore SPI)
>>>>>   - Implementations
>>>>>   - Terminology, to avoid any further confusion around valid
>>>>> configurations and sensible architectures
>>>>>
>>>>> We have loads of examples of users who get in trouble by configuring
>>>>> one incorrectly, but also there are plenty of efficiency improvements
>>>>> we could take advantage of by clearly splitting the integration points
>>>>> and the implementations in two categories.
>>>>>
>>>>> Not least, it's a very common and dangerous pitfall to assume that
>>>>> Infinispan is able to restore a consistent state after having stopped
>>>>> a DIST cluster which passivated into non-shared CacheStore instances,
>>>>> or even REPL clusters when they don't shutdown all at the same exact
>>>>> time (and "exact same time" is a strange concept at least..). We need
>>>>> to clarify the different options, tradeoffs and their consequences..
>>>>> to users and ourselves, as a clearly defined use case will avoid bugs
>>>>> and simplify implementations.
>>>>>
>>>>> # The purpose of each
>>>>> I think that people should use a non-shared (local?) CacheStore for
>>>>> the sole purpose of expanding to storage capacity of each single
>>>>> node.. be it because you don't have enough memory at all, or be it
>>>>> because you prefer some extra safety margin because either your
>>>>> estimates are complex, or maybe because we live in a real world were
>>>>> the hashing function might not be perfect in practice. I hope we all
>>>>> agree that Infinispan should be able to take such situations with at
>>>>> worst a graceful performance degradatation, rather than complain
>>>>> sending OOMs to the admin and setting the service on strike.
>>>>>
>>>>> A Shared CacheStore is useful for very different purposes; primarily
>>>>> to implement a Cache on some other service - for example your (single,
>>>>> shared) RDBMs, a slow (or expensive) webservice your organization has
>>>>> to call frequently, etc.. Or it's useful even as a write-through cache
>>>>> on a similar service, maybe internal but not able to handle the high
>>>>> variation of load spikes which Infinsipan can handle better.
>>>>> Finally, a great use case is to have a consistent backup of all your
>>>>> data-grid content, possibly in some "reference" form such as JPA
>>>>> mapped entities.
>>>>>
>>>>> # Benefits of a Non-Shared
>>>>> A non-shared CacheStore implementor should be able to take advantage
>>>>> of *its purpose*, among the big ones I see:
>>>>>   - Exclusive usage -> locking of a specific entry can be handled at
>>>>> datacontainer level, can simplify quite some internal code.
>>>>>   - Reliability -> since a clustered node needs to wipe its state at
>>>>> reboot (after a crash), it's much simpler to code any such CacheStore
>>>>> to avoid any form of disk synch or persistance guarantees.
>>>>>   - Encoding format -> this can be controlled entirely by Infinispan,
>>>>> and no need to take factors like rolling upgrade compatible encodings
>>>>> in mind. JBoss Marshalling would be good enough, or some
>>>>> implementations might not need to serialize at all.
>>>>>
>>>>> Our non-shared CacheStore implentation(s) could take advantage of
>>>>> lower level more complex code optimisations and interfaces, as users
>>>>> would rarely want to customize one of these, while the use case of
>>>>> mapping data to a shared service needs a more user friendly SPI so to
>>>>> keep it simple to plug in custom stores: custom data formats, custom
>>>>> connectors, get some help in implementing concurrency correctly.
>>>>> Proper Transaction integration for the CacheStore has been on our
>>>>> wishlist for some time too, I suspect that accepting that we have been
>>>>> mixing up two different things under a same name so far, would make it
>>>>> simpler to implement further improvements such as transactions: the
>>>>> way to do such a thing is very different in each of these use cases,
>>>>> so it would help at least to implement it on a subset first, or maybe
>>>>> only if it turns out there's no need for such things in the context of
>>>>> the local-only-dedicated "swapfile".
>>>>>
>>>>> # Mixed types should be killed
>>>>> I'm aware that some of our current implementations _could_ work both as
>>>>> shared or non-shared, for example the JDBC or JPACacheStore or the
>>>>> Remote Cachestore.. but in most cases it doesn't make much sense. Why
>>>>> would you ever want to use the JPACacheStore if not to share data with
>>>>> a _shared_ database?
>>>>>
>>>>> We should take such options away, and by doing so focus on the use
>>>>> cases which actually matter and simplify the implementations and
>>>>> improve the configuration validations.
>>>>>
>>>>> If ever a compelling storage technology is identified which we'd like to
>>>>> offer as an option for both shared or non-shared, I would still
>>>>> recommend to make two different implementations, as there certainly are
>>>>> different requirements and assumptions when coding such a thing.
>>>>>
>>>>> Not least, I would very like to see a default local CacheStore:
>>>>> picking one for local "emergency swapping" should be a no-brainer for
>>>>> users; we could setup one by default and not bother newcomers with
>>>>> complex choices.
>>>>>
>>>>> If we simplify the requirement of such a thing, it should be easy to
>>>>> write one on standard Java NIO2 APIs and get rid of the complexities of
>>>>> maintaining the native integration with things like LevelDB, not least
>>>>> the inefficiency of Java to make such native calls.
>>>>>
>>>>> Then as a second step, we should attack the other use case: backups;
>>>>> from a *purpose driven perspective* I'd then see us revive the Cassandra
>>>>> integration; obviously as a shared-only option.
>>>>>
>>>>> Cheers,
>>>>> Sanne
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>
>>>> --
>>>> Tristan Tarrant
>>>> Infinispan Lead
>>>> JBoss, a division of Red Hat
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list