[infinispan-dev] New API to iterate over current entries in cache

Tue May 6 08:16:19 EDT 2014

Sorry indentation below is broken because someone on this thread is
using HTML formatted emails.

On 5 May 2014 14:38, Dan Berindei <dan.berindei at gmail.com> wrote:
>
>
> On Thu, May 1, 2014 at 4:50 PM, William Burns <mudokonman at gmail.com> wrote:
>
> On Thu, May 1, 2014 at 8:59 AM, Sanne Grinovero <sanne at infinispan.org>
> wrote:
>
> On 30 April 2014 15:06, William Burns <mudokonman at gmail.com> wrote:
>
> Was wondering if anyone had any opinions on the API for this. These are a
> few options that Dan and I mulled over: Note the CloseableIterable inteface
> mentioned is just an interface that extends both Closeable and Iterable. 1.
> The API that is very similar to previously proposed in this list but
> slightly changed: Methods on AdvancedCache CloseableIterable<CacheEntry<K,
> V>> entryIterable(KeyValueFilter<? super K, ? super V> filter); <C>
> CloseableIterable<CacheEntry<K, C>> entryIterable(KeyValueFilter<? super K,
> ? super V> filter, Converter<? super K, ? super V, C> converter); Note the
> difference here is that it would return an Iterable instead of Iterator,
> which would allow for it being used in a for loop. Example usage would be
> (types omitted) for (CacheEntry entry : cache.entryIterable(someFilter,
> someConverter)) { // Do something }
>
> If it's important to close the Iterable, this example highlights a problem
> of the API. Ideally I think you might want to drop the need for the #close()
> method, but I'm guessing that's not an option so I'd avoid the Iterable API
> in that case.
>
> Good point, I totally forgot to cover the Closeable aspect in the first
> email. Unfortunately changing it to be Iterable does pose a slight issue. I
> was thinking we do something along the lines that Dan was thinking of by
> preventing the Iterable from producing more than 1 Iterable (maybe throw
> IllegalStateException). This way when we close the Iterable it would also
> close the underlying Iterator. try (EntryIterable<K, C> entries =
> advancedCache.entryIterable(someFilter, someConverter)) { for (Entry<K, C> e
> : entries) { ... } }
>
> You could think of an intermediary place-holder to still allow for natural
> iteration: try ( CacheEntryIteratorContext ctx =
> cache.entryIterable(someFilter, someConverter) ) { for (CacheEntry entry :
> ctx.asIterable()) { // Do something } } But I'm not liking the names I used
> above, as I would expect to be able to reuse the same iterator for multiple
> invocations of iterable(), and have each to restart the iteration from the
> beginning.
>
> Obviously from above this wouldn't be possible if we made those changes. Do
> you think this is reason to prevent those changes? Or do you think we should
> allow multiple iterators but closing the Iterable would also close down each
> of the Iterators? I am worried this might be a bit cumbersome/surprising,
> but documenting it might be sufficient.
>
>
> I don't think it would be surprising at all to invalidate all iterators on
> close, just as modifying java.util collections in any way invalidates all
> iterators.

That's not incorrect behaviour, but it's a surprising API, so I think
we should avoid suggesting that it might be iterated multiple times.

>
> Not allowing the user to iterate twice over the same Iterable, as I
> suggested, might be surprising, but the user can easily change his code to
> work around that.

Sure but it might not be immediately noticed, annoying. I generally
don't like libraries which force me to go back and fix things when I
discover it's not working as expected at a second time, this should be
clear from the very beginning of coding.

>
> I'm not sure the intermediate asIterable() call helps in any way, though,
> because it's just as easy for the user to "forget" to call close():

Right.

>
>      for (CacheEntry entry : ctx.entryIterable(someFilter,
> someConverter).asIterable()) {
>         // Do something
>      }
>
> It would be nice if Java would have followed C# in automatically calling
> close() at the end of a foreach loop, but I don't see a way to force the
> user to call close().
>
> Can this be solved with better name choices?
>
>
> I don't think so...

Let's avoid Iterable then.

>
> 2. An API that returns a new type EntryIterable for example that can chain
> methods to provide a filter and converter. on AdvancedCache EntryIterable<K,
> V> entryIterable(); where EntryIterable is defined as: public interface
> EntryIterable<K, V> extends CloseableIterable<CacheEntry<K, V>> { public
> EntryIterable<K, V> filter(KeyValueFilter<? super K, ? super V> filter);
> public EntryIterable<K, V> converter(Converter<? super K, ? super V, ?
> extends V> converter); public <C> CloseableIterable<CacheEntry<K, C>>
> projection(Converter<? super K, ? super V, C> converter); } Note that there
> are 2 methods that take a Converter, this is to preserve the typing, since
> the method would return a different EntryIterable instance. However I can
> also see removing one of the converter method and just rename projection to
> converter instead. This API would allow for providing optional fields more
> cleanly or not if all if desired. Example usage would be (types omitted) for
> (CacheEntry entry :
> cache.entryIterable().filter(someFilter).converter(someConverter)) { // Do
> something }
>
> This looks very nice, assuming you fix the missing close().
>
> So in that case you like #2 I assume? That is what I was leaning towards as
> well.
>
> Am I missing a catch?
>
> Yes it would be very similar to above with the outer try block and then
> passing in the CloseableIterable into the inner for loop.
>
> Also it's quite trivial for the user to do his own filtering and conversion
> in the "do something block", so I'm wondering if there is a reason beyond
> API shugar to expose this.
>
> The filters and converters are sent remotely to reduce resulting network
> traffic. The filter is applied on each node to determine what data it sends
> back and the converter is applied before sending back the value as well.
>
> It would be lovely if for example certain filters could affect loading from
> CacheStores - narrowing down a relational database select for example - and
> I guess the same concept could apply to the converted if you'd allow to
> select a subset of fields.
>
> There are plans to have a JPA string filter/converter that will live in a
> new forthcoming module. We could look into enhancing this in the future to
> have better integration with the JPACacheStore. A possible issue I can think
> of is if the projection doesn't contain the key value, because currently
> rehash can cause duplicate values that is detected by the key.
>
> I don't think these optimisations need to be coded right now, but it would
> be nice to keep the option open for future enhancement.
>
> I agree that would be nice
>
> 3. An API that requires the filter up front in the AdvancedCache method.
> This also brings up the point should we require a filter to always be
> provided? Unfortuantely this doesn't prevent a user from querying every
> entry as they can just use a filter that accepts all key/value pairs.
>
> Why is that unfortunate?
>
> I was saying along the lines if we wanted to make sure a user doesn't query
> all data. But if we do want them to do this, then it is great. I know some
> people are on the fence about it.
>
>
> This was my original proposal. My main concern was finding a good name for
> the entryIterable() method, so I figured we could avoid it completely:
>
>      try (CloseableIterable entries =
> cache.filter(filter).convert(converter)) {
>         for (CacheEntry entry : ctx.asIterable()) {
>            // Do something
>         }
>      }

Paul had an excellent proposal about filtered views of Caches.
>From an API such as "cache.filter(filter)" I would expect a return of
type Cache, such a filtered one: for it the return an Iterable is not
only surprising but I think it would prevent us to eventually make the
filtered cache feature.

>
> I figured it would be good to nudge users toward using a filter instead of
> filtering in the for block, because of the savings in network.
> OTOH, even though it doesn't prevent the user from iterating over all the
> entries in the cache, it does make it look a bit awkward.
>
> on AdvancedCache EntryIterable<K, V> entryIterable(Filter<? super K, ? super
> V> filter) where EntryIterable is defined as: public interface
> EntryIterable<K, V> extends CloseableIterable<CacheEntry<K, V>> { public <C>
> CloseableIterable<CacheEntry<K, C>> converter(Converter<? super K, ? super
> V, C> converter); } The usage would be identical to #2 except the filter is
> always provided.
>
> I wouldn't mandate it, but in case you overload the method it probably is a
> good idea to internally apply an accept-all filter so you have a single
> implementation. Could be a singleton, which also implies an efficient
> Externalizer.
>
> Although for performance I guess it would be better to not provide a Filter
> for the retrieve all case.
>
>
> I doubt the size of a custom filter would matter much compared to all the
> entries in the cache, the only issue is the ease of use.
>
> Let me know what you guys think or if you have any other suggestions.
> Thanks, - Will _______________________________________________
> infinispan-dev mailing list infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________ infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________ infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev