[infinispan-dev] Shared vs Non-Shared CacheStores

Dan Berindei dan.berindei at gmail.com
Mon Jul 20 06:02:26 EDT 2015


Sanne, I think changing the cache store API is actually the most
painful part, so we should only do it if we gain a concrete advantage
from doing it. From a compatibility point of view, implementing a new
interface vs implementing the same interface with completely different
methods is just as bad.

On Mon, Jul 20, 2015 at 12:41 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
> +1 for incremental changes..
>
> I'd see the first step as defining two different interfaces;
> essentially we need to choose two good names.
>
> Then we could have both interfaces still implement the same identical
> methods, but go through each implementation and decide to "mark" it as
> shared-only or never-shared.
>
> That would make it simpler to make concrete change proposals on each
> of them and start taking some advantage from the split. I think you'll
> need the two different interfaces to implement the validations you
> mentioned.
>
> For Infinispan 8's goals, I'd be happy enough to keep the
> "shared-only" interface quite similar to the current one, but mark the
> never-shared one as a private or experimental SPI to allow ourselves
> some more flexibility in performance oriented changes.
>
> Thanks,
> Sanne
>
> On 20 July 2015 at 10:07, Tristan Tarrant <ttarrant at redhat.com> wrote:
>> Sanne, well written.
>> Before actually implementing any of the optimizations/changes you
>> mention, I think the lowest-hanging fruit we should grab now is just to
>> add checks to all of our cachestores to actually throw an exception when
>> they are being enabled in unsupported configurations.
>>
>> I've created [1] to get us started
>>
>> Tristan
>>
>> [1] https://issues.jboss.org/browse/ISPN-5617
>>
>> On 16/07/2015 15:32, Sanne Grinovero wrote:
>>> I would like to propose a clear cut separation between our shared and
>>> non-shared CacheStores,
>>> in all terms such as:
>>>   - Configuration options
>>>   - Integration contracts (Split the CacheStore SPI)
>>>   - Implementations
>>>   - Terminology, to avoid any further confusion around valid
>>> configurations and sensible architectures
>>>
>>> We have loads of examples of users who get in trouble by configuring
>>> one incorrectly, but also there are plenty of efficiency improvements
>>> we could take advantage of by clearly splitting the integration points
>>> and the implementations in two categories.
>>>
>>> Not least, it's a very common and dangerous pitfall to assume that
>>> Infinispan is able to restore a consistent state after having stopped
>>> a DIST cluster which passivated into non-shared CacheStore instances,
>>> or even REPL clusters when they don't shutdown all at the same exact
>>> time (and "exact same time" is a strange concept at least..). We need
>>> to clarify the different options, tradeoffs and their consequences..
>>> to users and ourselves, as a clearly defined use case will avoid bugs
>>> and simplify implementations.
>>>
>>> # The purpose of each
>>> I think that people should use a non-shared (local?) CacheStore for
>>> the sole purpose of expanding to storage capacity of each single
>>> node.. be it because you don't have enough memory at all, or be it
>>> because you prefer some extra safety margin because either your
>>> estimates are complex, or maybe because we live in a real world were
>>> the hashing function might not be perfect in practice. I hope we all
>>> agree that Infinispan should be able to take such situations with at
>>> worst a graceful performance degradatation, rather than complain
>>> sending OOMs to the admin and setting the service on strike.
>>>
>>> A Shared CacheStore is useful for very different purposes; primarily
>>> to implement a Cache on some other service - for example your (single,
>>> shared) RDBMs, a slow (or expensive) webservice your organization has
>>> to call frequently, etc.. Or it's useful even as a write-through cache
>>> on a similar service, maybe internal but not able to handle the high
>>> variation of load spikes which Infinsipan can handle better.
>>> Finally, a great use case is to have a consistent backup of all your
>>> data-grid content, possibly in some "reference" form such as JPA
>>> mapped entities.
>>>
>>> # Benefits of a Non-Shared
>>> A non-shared CacheStore implementor should be able to take advantage
>>> of *its purpose*, among the big ones I see:
>>>   - Exclusive usage -> locking of a specific entry can be handled at
>>> datacontainer level, can simplify quite some internal code.
>>>   - Reliability -> since a clustered node needs to wipe its state at
>>> reboot (after a crash), it's much simpler to code any such CacheStore
>>> to avoid any form of disk synch or persistance guarantees.
>>>   - Encoding format -> this can be controlled entirely by Infinispan,
>>> and no need to take factors like rolling upgrade compatible encodings
>>> in mind. JBoss Marshalling would be good enough, or some
>>> implementations might not need to serialize at all.
>>>
>>> Our non-shared CacheStore implentation(s) could take advantage of
>>> lower level more complex code optimisations and interfaces, as users
>>> would rarely want to customize one of these, while the use case of
>>> mapping data to a shared service needs a more user friendly SPI so to
>>> keep it simple to plug in custom stores: custom data formats, custom
>>> connectors, get some help in implementing concurrency correctly.
>>> Proper Transaction integration for the CacheStore has been on our
>>> wishlist for some time too, I suspect that accepting that we have been
>>> mixing up two different things under a same name so far, would make it
>>> simpler to implement further improvements such as transactions: the
>>> way to do such a thing is very different in each of these use cases,
>>> so it would help at least to implement it on a subset first, or maybe
>>> only if it turns out there's no need for such things in the context of
>>> the local-only-dedicated "swapfile".
>>>
>>> # Mixed types should be killed
>>> I'm aware that some of our current implementations _could_ work both as
>>> shared or non-shared, for example the JDBC or JPACacheStore or the
>>> Remote Cachestore.. but in most cases it doesn't make much sense. Why
>>> would you ever want to use the JPACacheStore if not to share data with
>>> a _shared_ database?
>>>
>>> We should take such options away, and by doing so focus on the use
>>> cases which actually matter and simplify the implementations and
>>> improve the configuration validations.
>>>
>>> If ever a compelling storage technology is identified which we'd like to
>>> offer as an option for both shared or non-shared, I would still
>>> recommend to make two different implementations, as there certainly are
>>> different requirements and assumptions when coding such a thing.
>>>
>>> Not least, I would very like to see a default local CacheStore:
>>> picking one for local "emergency swapping" should be a no-brainer for
>>> users; we could setup one by default and not bother newcomers with
>>> complex choices.
>>>
>>> If we simplify the requirement of such a thing, it should be easy to
>>> write one on standard Java NIO2 APIs and get rid of the complexities of
>>> maintaining the native integration with things like LevelDB, not least
>>> the inefficiency of Java to make such native calls.
>>>
>>> Then as a second step, we should attack the other use case: backups;
>>> from a *purpose driven perspective* I'd then see us revive the Cassandra
>>> integration; obviously as a shared-only option.
>>>
>>> Cheers,
>>> Sanne
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>> --
>> Tristan Tarrant
>> Infinispan Lead
>> JBoss, a division of Red Hat
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list