[infinispan-dev] the new CacheLoader API

Fri Aug 9 11:40:02 EDT 2013

On 7 Aug 2013, at 14:57, Galder Zamarreño <galder at redhat.com> wrote:

> A few things to note that have not been mentioned:
> 
> - BulkCacheLoader.bulk* methods: if what you're doing is return an iterator, why not call that method iterator(), just like Collection defines iterator?
> 
> - BulkCacheLoader and Iterator classes: I find it a bit confusing that we return Infinispan specific iterators from bulk* methods but removeAll and others take a different Iterator… I don't have an answer for this… :|

This will change based on Sanne's suggestion of parallel processing.
However the reason I preferred an custom iterator over the one in java are:
- j.u.Teartor.remove method won't be supported so not nice having it there
- when iterating over the CacheEntries, with the j.u.Itearator I have to create an Map.Entry object for each entry I process. This object doesn't need to be created on an native iterator that has the key(), value() and next() methods.

> - We're moving away from the cache loader vs cache store interface separation? I kinda liked the way JSR-107 had defined these, keeping them separate and keeping them independent [1]. A read-only cache store would implement only cache loader. A read-write one would implement both interfaces.

Yeah people seem to prefer this for some reason :-)
Here's my though on this : I think CacheLoader + CacheWriter is a nice OOP design, but is rather theoretical. In all external users scenarios I know, the interaction with the store is read+write so most of the people thing about a store along this lines. Having a distinction between loads and stores seems unnatural and creates confusion. For the few that only need a loader they can simply leave the store() empty - as simple as that. 

> 
> More below...
> 
> [1] https://github.com/jsr107/jsr107spec/tree/v0.8/src/main/java/javax/cache/integration
> 
> On Aug 6, 2013, at 7:13 PM, William Burns <wburns at redhat.com> wrote:
> 
>> 
>> ----- Original Message -----
>>> From: "Manik Surtani" <msurtani at redhat.com>
>>> To: "Mircea Markus" <mmarkus at redhat.com>
>>> Cc: "Infinispan Core Devs" <infinispan-core-dev at infinispan.org>
>>> Sent: Monday, August 5, 2013 9:23:41 AM
>>> Subject: Re: the new CacheLoader API
>>> 
>>> 
>>> On 1 Aug 2013, at 11:50, Mircea Markus <mmarkus at redhat.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Based on the feedback I received and after doing some prototyping, here's
>>>> the new CacheLoader API I came up
>>>> with:https://github.com/mmarkus/infinispan/tree/t_ISPN-3290/CS_redesign/core/src/main/java/org/infinispan/loaders/spi
>>>> 
>>>> I know everybody's quite busy but can you please take some time and review
>>>> this? It's a very important API chance and this would help to getting it
>>>> off on the right foot.
>>>> 
>>>> It should cover all the gathered requirements:
>>>> 
>>>> - support for non-distributed transaction cache stores (1PC) and support
>>>> for XA capable cache store
>>>> [Decision: all the transaction support is delegated to Infinispan. The
>>>> cache store SPI is much simplified.]
>>> 
>>> +1
>>> 
>>>> 
>>>> - support iteration over all the keys/entries in the store
>>>> - needed for efficient Map/Reduce integration
>>>> - needed for efficient implementation of Cache.keySet(), Cache.entrySet(),
>>>> Cache.values() methods
> 
> ^ We all agree that these methods are problematic. I don't think we should work towards making them efficient. Ultimately we should phase them out...
> 
>>>> [look in BulkCacheLoader]
>>> 
>>> I think you meant for bulkLoadKeys() to take in a KeyFilter, not a
>>> Collection?
>> 
>> My guess is Mircea was going for an overloaded method for bulkLoadKeys where we want both one that takes a Collection and one that takes a KeyFilter?  If so it seems to me it would be simpler to just have a single method that takes KeyFilter only, but then have another class like KeyFilters that has various static factory methods that can take a Iterable or Iterator for example so we don't have too many methods on the loader itself.
> 
> +1
> 
>> Also removeAll should probably take a KeyFilter similarly.
> 
> +1
> 
>> I noticed there is an overloaded bulkLoad which takes no arguments, which I assume just streams/loads all entries from the cache store.  Do we need to have similarly overloaded methods for bulkLoadKeys?  If not than do we even need the no arg bulkLoad method and just have null/AllKeyFilter be passed in instead?  Also similarly would we need clear method then?  Just trying to think if we can make the interface have as few methods as needed.
> 
> +1, and +100 to keeping as few methods as needed.
> 
>> Some methods don't have types on them.  I am guessing we want to put Object type just to prevent random warnings for user and CacheStore implementations.
> 
> +1
> 
>> 
>>> 
>>>> 
>>>> - a simple read(k) + write(k,v) interface to be implemented by users that
>>>> just want to position ISPN as a cache between an app and a legacy system
>>>> and which don't need/want to be bothered with all the other complex
>>>> features
>>>> [look at BasicCacheLoader]
>> 
>> We should move to the immutable CacheStoreConfiguration class now instead of CacheStoreConfig, right?
> 
> +1
> 
>> 
>>> 
>>> Shouldn't remove() return a boolean?
>> 
>> It looks like we don't use the return value in ISPN code currently, but I would agree for future usage we should probably have it return a boolean.  Also to be more inline with Collection interface do we want a boolean to be returned from the removeAll method if something was removed?
> 
> It all depends what the guarantees should be at the end of the remove… if the method has no return, implementations are more free to do the remove asynchronously and not having to wait to see if anything was removed. The return forces some kind of result out of that removal, which could help Infinsipan makes decisions in the future...
> 
>> 
>>> 
>>>> 
>>>> - support for expiration notification (ISPN-3064)
>>>> [look at ExpiryCacheLoader.purgeExpired]
>> 
>> I am wondering if there is some way to have this be more of a streaming approach instead of returning a Set will all of the expired keys in memory.  What do you think if we passed in a callback of some type to the purgeExpired method instead.  This would make it the burden of the CacheLoader to properly invoke the method for each one, but it would allow for more efficient use of memory when the cache loader can implement a streaming approach to eviction.
> 
> That sounds like an good idea :). Also, why does ExpiryCacheLoader need to extend BasicCacheLoader? Couldn't it be treated like a mixin? IOW, have a cache loader that can optionally purgeExpired entries. 
> 
>> 
>>> 
>>> Returning the set of purged keys could be unnecessarily expensive... e.g., in
>>> a JDBC-backed store, expiration could be a simple DELETE WHERE... statement,
>>> whereas if the keys need to be returned, this becomes far more complex.
>>>> 
>>>> - support for size (efficient implementation of the cache.size() method)
>>>> [look at BulkCacheLoader.size()]
>>> 
>>> I'm not so sure if I buy the separate interfaces approach.  E.g., what
>>> happens to calling cache.size() when the configured cache loader is not a
>>> BulkCacheLoader?  YOu effectively change the contract of cache.size()?
>> 
>> Actually that is an interesting point. I would normally say it has to fallback to the bulkLoadKeys method but that is also on the BulkCacheLoader interface.  Actually without defining your Cache to implement BulkCacheLoader you can't even have preload enabled either.  It seems we maybe need another interface or move some of the methods down to the BasicCacheLoader
> 
> I also agree that this separation of bulk vs basic cache loader interface is a bit weird… , particularly since the difference between the two is not only configuration (whether preload is enabled or not), but also the fact that some cache operations currently require an iterable cache store (size, keys, values, entries and iterator!!).
> 
> To reiterate what I said above, we shouldn't fixate ourselves with size, keys, values and entries methods, since they're difficult to implement and potentially expensive, but iterator() is something we're always going to be needing in the Cache interface. So, iterator() should really be living in CacheLoader (or BasicCacheLoader) interface.
> 
> With all these in mind, I can see the point of separating any cache loader methods that support cache methods that we'll be phasing out (keys, values, size) to a separate inteface, i.e. BulkCacheLoader. This way, when we phase them out, we don't need to modify the base cache loader interface. By simply removing the BulkCacheLoader interface you've achieved that.
> 
> Cheers,
> 
>> 
>>> 
>>>> 
>>>> Cheers,
>>>> --
>>>> Mircea Markus
>>>> Infinispan lead (www.infinispan.org)
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Manik Surtani
>>> 
>>> 
>>> 
>>> 
> 
> 
> --
> Galder Zamarreño
> galder at redhat.com
> twitter.com/galderz
> 
> Project Lead, Escalante
> http://escalante.io
> 
> Engineer, Infinispan
> http://infinispan.org
> 

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)