[infinispan-dev] Shared vs Non-Shared CacheStores

Sanne Grinovero sanne at infinispan.org
Fri Aug 21 08:21:52 EDT 2015


+1

I like that plan, however I don't have any problem with marker
interfaces either.

I remember the annotations used internally by Infinispan were (a long
time ago) the cause for a very slow start, which was then fixed by
indexing the annotations at compile time; loading this index as a
resource at runtime time has caused some unnecessary complexity in
some modular environments. No biggie, just saying annotations have
some tradeoffs too ;)


On 21 August 2015 at 10:10, Tristan Tarrant <ttarrant at redhat.com> wrote:
> I've been thinking more about this issue, after talking with Sanne, and
> here's my (possibly faulty) analysis:
>
> I don't think this is so dramatic or urgent that we need a solution
> (i.e. a distinct SPI for embedded cachestores) in place by 8.0. This is
> something that we can design and introduce as a private-only SPI during
> the 8.x series and migrate our stores to use it accordingly. Note that
> such a SPI would be more closely tied to the DataContainer so it may not
> even have a relationship with the PersistenceManager.
>
> What I would like to see in the current SPI for 8.0, however, is an
> extensible way for cachestores to expose "capabilities" so that not only
> can we prevent potentially broken configurations, but we can also
> declare support for advanced functionality (shared, transactional,
> schema-aware, etc). I'm not fond of marker-only interfaces (see
> org.infinispan.persistence.spi.LocalOnlyCacheLoader), so I'd prefer an
> annotation-based approach.
>
> Tristan
>
> On 06/08/2015 10:39, Radim Vansa wrote:
>> I understand that shared cache stores will be more common to be
>> implemented, I don't think that non-shared stores should be considered
>> 'private interface'. But separating them would really give the
>> oportunity to change this non-shared SPI more often if needed without
>> breaking shared one.
>> However, hot-glueing a new cool interface without referential
>> implementation that supports transaction, solves the ton of issues
>> described in [1] is not a wise move, IMO. And there's no time to
>> implement this before 8.0.0.Final.
>>
>> Radim
>>
>> [1]
>> https://github.com/infinispan/infinispan/wiki/Consistency-guarantees-in-Infinispan
>>
>> On 08/05/2015 11:57 PM, Sanne Grinovero wrote:
>>> I don't doubt Radim's code :) but I'm pretty confident that even that
>>> implementation is limited by the constraints of the general-purpose
>>> API.
>>>
>>> For example it seems Bela will soon allow more flexibility in JGroups
>>> regarding buffer representations. We need to commit on a stable API
>>> for end user integrations (shared cachestore implementors), but we
>>> also need to keep options open to soon play with other approaches.
>>>
>>> That's why I think this separation should be done before Infinispan
>>> 8.0.0.Final even if I don't have a concrete proposal for how this
>>> other API should look like: I don't presume to be able to anticipate
>>> which API exactly will be best, but I think we can all see that we
>>> will want to change that. There should be a private internal contract
>>> which we can change even in micro versions without concerns of
>>> compatibility, so to allow R&D progress in the most performance
>>> sensitive areas w/o this being a problem for integrators and users.
>>>
>>> Better configuration validations are additional (strong) benefits:
>>> we've seen lots of misunderstandings about which CacheStores /
>>> configuration combinations are valid.
>>>
>>> Thanks,
>>> Sanne
>>>
>>> On 5 August 2015 at 22:13, Dan Berindei <dan.berindei at gmail.com> wrote:
>>>> On Fri, Jul 31, 2015 at 3:30 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>>> On 20 July 2015 at 11:02, Dan Berindei <dan.berindei at gmail.com> wrote:
>>>>>> Sanne, I think changing the cache store API is actually the most
>>>>>> painful part, so we should only do it if we gain a concrete advantage
>>>>>> from doing it. From a compatibility point of view, implementing a new
>>>>>> interface vs implementing the same interface with completely different
>>>>>> methods is just as bad.
>>>>> Right, from that perspective it's a quite horrible proposal.
>>>>>
>>>>> But I think we can agree that only the "SharedCacheStore" deserves to
>>>>> be considered an SPI, right?
>>>>> That's the one people will normally customize to map stuff to other
>>>>> stores one might have.
>>>>>
>>>>> I think it's important that beyond Infinispan 8.0 API's freeze, we can
>>>>> make any change to the non-shared SPI
>>>>> without affecting users who implement a custom shared cachestore.
>>>>>
>>>>> I highly doubt someone will implement a high-performance custom off
>>>>> heap swap strategy, but if someone does he should contribute it and
>>>>> will probably need to make integration level changes.
>>>>>
>>>>> We probably won't have the time to implement a new super efficient
>>>>> local-only cachestore to replace the leveldb one, but I'd like to keep
>>>>> the possibility open to do that beyond 8.0, *especially* without
>>>>> breaking compatibility for other people.
>>>> We already have a new super efficient local-only cachestore :)
>>>>
>>>> https://github.com/infinispan/infinispan/tree/master/persistence/soft-index
>>>>
>>>>
>>>>> Sanne
>>>>>
>>>>>
>>>>>> On Mon, Jul 20, 2015 at 12:41 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>>>>> +1 for incremental changes..
>>>>>>>
>>>>>>> I'd see the first step as defining two different interfaces;
>>>>>>> essentially we need to choose two good names.
>>>>>>>
>>>>>>> Then we could have both interfaces still implement the same identical
>>>>>>> methods, but go through each implementation and decide to "mark" it as
>>>>>>> shared-only or never-shared.
>>>>>>>
>>>>>>> That would make it simpler to make concrete change proposals on each
>>>>>>> of them and start taking some advantage from the split. I think you'll
>>>>>>> need the two different interfaces to implement the validations you
>>>>>>> mentioned.
>>>>>>>
>>>>>>> For Infinispan 8's goals, I'd be happy enough to keep the
>>>>>>> "shared-only" interface quite similar to the current one, but mark the
>>>>>>> never-shared one as a private or experimental SPI to allow ourselves
>>>>>>> some more flexibility in performance oriented changes.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sanne
>>>>>>>
>>>>>>> On 20 July 2015 at 10:07, Tristan Tarrant <ttarrant at redhat.com> wrote:
>>>>>>>> Sanne, well written.
>>>>>>>> Before actually implementing any of the optimizations/changes you
>>>>>>>> mention, I think the lowest-hanging fruit we should grab now is just to
>>>>>>>> add checks to all of our cachestores to actually throw an exception when
>>>>>>>> they are being enabled in unsupported configurations.
>>>>>>>>
>>>>>>>> I've created [1] to get us started
>>>>>>>>
>>>>>>>> Tristan
>>>>>>>>
>>>>>>>> [1] https://issues.jboss.org/browse/ISPN-5617
>>>>>>>>
>>>>>>>> On 16/07/2015 15:32, Sanne Grinovero wrote:
>>>>>>>>> I would like to propose a clear cut separation between our shared and
>>>>>>>>> non-shared CacheStores,
>>>>>>>>> in all terms such as:
>>>>>>>>>     - Configuration options
>>>>>>>>>     - Integration contracts (Split the CacheStore SPI)
>>>>>>>>>     - Implementations
>>>>>>>>>     - Terminology, to avoid any further confusion around valid
>>>>>>>>> configurations and sensible architectures
>>>>>>>>>
>>>>>>>>> We have loads of examples of users who get in trouble by configuring
>>>>>>>>> one incorrectly, but also there are plenty of efficiency improvements
>>>>>>>>> we could take advantage of by clearly splitting the integration points
>>>>>>>>> and the implementations in two categories.
>>>>>>>>>
>>>>>>>>> Not least, it's a very common and dangerous pitfall to assume that
>>>>>>>>> Infinispan is able to restore a consistent state after having stopped
>>>>>>>>> a DIST cluster which passivated into non-shared CacheStore instances,
>>>>>>>>> or even REPL clusters when they don't shutdown all at the same exact
>>>>>>>>> time (and "exact same time" is a strange concept at least..). We need
>>>>>>>>> to clarify the different options, tradeoffs and their consequences..
>>>>>>>>> to users and ourselves, as a clearly defined use case will avoid bugs
>>>>>>>>> and simplify implementations.
>>>>>>>>>
>>>>>>>>> # The purpose of each
>>>>>>>>> I think that people should use a non-shared (local?) CacheStore for
>>>>>>>>> the sole purpose of expanding to storage capacity of each single
>>>>>>>>> node.. be it because you don't have enough memory at all, or be it
>>>>>>>>> because you prefer some extra safety margin because either your
>>>>>>>>> estimates are complex, or maybe because we live in a real world were
>>>>>>>>> the hashing function might not be perfect in practice. I hope we all
>>>>>>>>> agree that Infinispan should be able to take such situations with at
>>>>>>>>> worst a graceful performance degradatation, rather than complain
>>>>>>>>> sending OOMs to the admin and setting the service on strike.
>>>>>>>>>
>>>>>>>>> A Shared CacheStore is useful for very different purposes; primarily
>>>>>>>>> to implement a Cache on some other service - for example your (single,
>>>>>>>>> shared) RDBMs, a slow (or expensive) webservice your organization has
>>>>>>>>> to call frequently, etc.. Or it's useful even as a write-through cache
>>>>>>>>> on a similar service, maybe internal but not able to handle the high
>>>>>>>>> variation of load spikes which Infinsipan can handle better.
>>>>>>>>> Finally, a great use case is to have a consistent backup of all your
>>>>>>>>> data-grid content, possibly in some "reference" form such as JPA
>>>>>>>>> mapped entities.
>>>>>>>>>
>>>>>>>>> # Benefits of a Non-Shared
>>>>>>>>> A non-shared CacheStore implementor should be able to take advantage
>>>>>>>>> of *its purpose*, among the big ones I see:
>>>>>>>>>     - Exclusive usage -> locking of a specific entry can be handled at
>>>>>>>>> datacontainer level, can simplify quite some internal code.
>>>>>>>>>     - Reliability -> since a clustered node needs to wipe its state at
>>>>>>>>> reboot (after a crash), it's much simpler to code any such CacheStore
>>>>>>>>> to avoid any form of disk synch or persistance guarantees.
>>>>>>>>>     - Encoding format -> this can be controlled entirely by Infinispan,
>>>>>>>>> and no need to take factors like rolling upgrade compatible encodings
>>>>>>>>> in mind. JBoss Marshalling would be good enough, or some
>>>>>>>>> implementations might not need to serialize at all.
>>>>>>>>>
>>>>>>>>> Our non-shared CacheStore implentation(s) could take advantage of
>>>>>>>>> lower level more complex code optimisations and interfaces, as users
>>>>>>>>> would rarely want to customize one of these, while the use case of
>>>>>>>>> mapping data to a shared service needs a more user friendly SPI so to
>>>>>>>>> keep it simple to plug in custom stores: custom data formats, custom
>>>>>>>>> connectors, get some help in implementing concurrency correctly.
>>>>>>>>> Proper Transaction integration for the CacheStore has been on our
>>>>>>>>> wishlist for some time too, I suspect that accepting that we have been
>>>>>>>>> mixing up two different things under a same name so far, would make it
>>>>>>>>> simpler to implement further improvements such as transactions: the
>>>>>>>>> way to do such a thing is very different in each of these use cases,
>>>>>>>>> so it would help at least to implement it on a subset first, or maybe
>>>>>>>>> only if it turns out there's no need for such things in the context of
>>>>>>>>> the local-only-dedicated "swapfile".
>>>>>>>>>
>>>>>>>>> # Mixed types should be killed
>>>>>>>>> I'm aware that some of our current implementations _could_ work both as
>>>>>>>>> shared or non-shared, for example the JDBC or JPACacheStore or the
>>>>>>>>> Remote Cachestore.. but in most cases it doesn't make much sense. Why
>>>>>>>>> would you ever want to use the JPACacheStore if not to share data with
>>>>>>>>> a _shared_ database?
>>>>>>>>>
>>>>>>>>> We should take such options away, and by doing so focus on the use
>>>>>>>>> cases which actually matter and simplify the implementations and
>>>>>>>>> improve the configuration validations.
>>>>>>>>>
>>>>>>>>> If ever a compelling storage technology is identified which we'd like to
>>>>>>>>> offer as an option for both shared or non-shared, I would still
>>>>>>>>> recommend to make two different implementations, as there certainly are
>>>>>>>>> different requirements and assumptions when coding such a thing.
>>>>>>>>>
>>>>>>>>> Not least, I would very like to see a default local CacheStore:
>>>>>>>>> picking one for local "emergency swapping" should be a no-brainer for
>>>>>>>>> users; we could setup one by default and not bother newcomers with
>>>>>>>>> complex choices.
>>>>>>>>>
>>>>>>>>> If we simplify the requirement of such a thing, it should be easy to
>>>>>>>>> write one on standard Java NIO2 APIs and get rid of the complexities of
>>>>>>>>> maintaining the native integration with things like LevelDB, not least
>>>>>>>>> the inefficiency of Java to make such native calls.
>>>>>>>>>
>>>>>>>>> Then as a second step, we should attack the other use case: backups;
>>>>>>>>> from a *purpose driven perspective* I'd then see us revive the Cassandra
>>>>>>>>> integration; obviously as a shared-only option.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Sanne
>>>>>>>>> _______________________________________________
>>>>>>>>> infinispan-dev mailing list
>>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Tristan Tarrant
>>>>>>>> Infinispan Lead
>>>>>>>> JBoss, a division of Red Hat
>>>>>>>> _______________________________________________
>>>>>>>> infinispan-dev mailing list
>>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>> _______________________________________________
>>>>>>> infinispan-dev mailing list
>>>>>>> infinispan-dev at lists.jboss.org
>>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>
> --
> Tristan Tarrant
> Infinispan Lead
> JBoss, a division of Red Hat
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list