[infinispan-dev] New API to iterate over current entries in cache

Mon May 5 09:38:12 EDT 2014

On Thu, May 1, 2014 at 4:50 PM, William Burns <mudokonman at gmail.com> 
wrote:
> On Thu, May 1, 2014 at 8:59 AM, Sanne Grinovero 
> <sanne at infinispan.org> wrote:
>>  On 30 April 2014 15:06, William Burns <mudokonman at gmail.com> wrote:
>>>  Was wondering if anyone had any opinions on the API for this.
>>> 
>>>  These are a few options that Dan and I mulled over:
>>> 
>>>  Note the CloseableIterable inteface mentioned is just an interface
>>>  that extends both Closeable and Iterable.
>>> 
>>>  1. The API that is very similar to previously proposed in this list
>>>  but slightly changed:
>>> 
>>>  Methods on AdvancedCache
>>> 
>>>  CloseableIterable<CacheEntry<K, V>> entryIterable(KeyValueFilter<?
>>>  super K, ? super V> filter);
>>> 
>>>  <C> CloseableIterable<CacheEntry<K, C>> 
>>> entryIterable(KeyValueFilter<?
>>>  super K, ? super V> filter, Converter<? super K, ? super V, C>
>>>  converter);
>>> 
>>>  Note the difference here is that it would return an Iterable 
>>> instead
>>>  of Iterator, which would allow for it being used in a for loop.
>>> 
>>>  Example usage would be (types omitted)
>>> 
>>>  for (CacheEntry entry : cache.entryIterable(someFilter, 
>>> someConverter)) {
>>>  // Do something
>>>  }
>> 
>> 
>>  If it's important to close the Iterable, this example highlights a
>>  problem of the API.
>>  Ideally I think you might want to drop the need for the #close()
>>  method, but I'm guessing that's not an option so I'd avoid the
>>  Iterable API in that case.
> 
> Good point, I totally forgot to cover the Closeable aspect in the 
> first email.
> 
> Unfortunately changing it to be Iterable does pose a slight issue.  I
> was thinking we do something along the lines that Dan was thinking of
> by preventing the Iterable from producing more than 1 Iterable (maybe
> throw IllegalStateException).  This way when we close the Iterable it
> would also close the underlying Iterator.
> 
> 
> try (EntryIterable<K, C> entries =
> advancedCache.entryIterable(someFilter, someConverter)) {
>   for (Entry<K, C> e : entries) {
>     ...
>   }
> }
> 
> 
>>  You could think of an intermediary place-holder to still allow for
>>  natural iteration:
>> 
>>   try ( CacheEntryIteratorContext ctx = 
>> cache.entryIterable(someFilter,
>>  someConverter) ) {
>>      for (CacheEntry entry : ctx.asIterable()) {
>>         // Do something
>>      }
>>   }
>> 
>>  But I'm not liking the names I used above, as I would expect to be
>>  able to reuse the same iterator for multiple invocations of
>>  iterable(), and have each to restart the iteration from the 
>> beginning.
> 
> Obviously from above this wouldn't be possible if we made those
> changes.  Do you think this is reason to prevent those changes?  Or do
> you think we should allow multiple iterators but closing the Iterable
> would also close down each of the Iterators?  I am worried this might
> be a bit cumbersome/surprising, but documenting it might be
> sufficient.

I don't think it would be surprising at all to invalidate all iterators 
on close, just as modifying java.util collections in any way 
invalidates all iterators.

Not allowing the user to iterate twice over the same Iterable, as I 
suggested, might be surprising, but the user can easily change his code 
to work around that.

I'm not sure the intermediate asIterable() call helps in any way, 
though, because it's just as easy for the user to "forget" to call 
close():

     for (CacheEntry entry : ctx.entryIterable(someFilter, 
someConverter).asIterable()) {
        // Do something
     }

It would be nice if Java would have followed C# in automatically 
calling close() at the end of a foreach loop, but I don't see a way to 
force the user to call close().

> 
> 
>>  Can this be solved with better name choices?

I don't think so...

>> 
>> 
>> 
>>>  2. An API that returns a new type EntryIterable for example that 
>>> can
>>>  chain methods to provide a filter and converter.
>>> 
>>>  on AdvancedCache
>>> 
>>>  EntryIterable<K, V> entryIterable();
>>> 
>>>  where EntryIterable is defined as:
>>> 
>>>  public interface EntryIterable<K, V> extends
>>>  CloseableIterable<CacheEntry<K, V>> {
>>> 
>>>     public EntryIterable<K, V> filter(KeyValueFilter<? super K, ? 
>>> super
>>>  V> filter);
>>> 
>>>     public EntryIterable<K, V> converter(Converter<? super K, ? 
>>> super
>>>  V, ? extends V> converter);
>>> 
>>>     public <C> CloseableIterable<CacheEntry<K, C>>
>>>  projection(Converter<? super K, ? super V, C> converter);
>>>  }
>>> 
>>>  Note that there are 2 methods that take a Converter, this is to
>>>  preserve the typing, since the method would return a different
>>>  EntryIterable instance.  However I can also see removing one of the
>>>  converter method and just rename projection to converter instead.
>>> 
>>>  This API would allow for providing optional fields more cleanly or 
>>> not
>>>  if all if desired.
>>> 
>>>  Example usage would be (types omitted)
>>> 
>>>  for (CacheEntry entry :
>>>  cache.entryIterable().filter(someFilter).converter(someConverter)) 
>>> {
>>>  // Do something
>>>  }
>> 
>>  This looks very nice, assuming you fix the missing close().
> 
> So in that case you like #2 I assume?  That is what I was leaning
> towards as well.
> 
>>  Am I missing a catch?
> 
> Yes it would be very similar to above with the outer try block and
> then passing in the CloseableIterable into the inner for loop.
> 
>> 
>>  Also it's quite trivial for the user to do his own filtering and
>>  conversion in the "do something block", so I'm wondering if there 
>> is a
>>  reason beyond API shugar to expose this.
> 
> The filters and converters are sent remotely to reduce resulting
> network traffic.  The filter is applied on each node to determine what
> data it sends back and the converter is applied before sending back
> the value as well.
> 
>>  It would be lovely if for example certain filters could affect 
>> loading
>>  from CacheStores - narrowing down a relational database select for
>>  example - and I guess the same concept could apply to the converted 
>> if
>>  you'd allow to select a subset of fields.
> 
> There are plans to have a JPA string filter/converter that will live
> in a new forthcoming module.  We could look into enhancing this in the
> future to have better integration with the JPACacheStore.  A possible
> issue I can think of is if the projection doesn't contain the key
> value, because currently rehash can cause duplicate values that is
> detected by the key.
> 
>> 
>>  I don't think these optimisations need to be coded right now, but it
>>  would be nice to keep the option open for future enhancement.
> 
> I agree that would be nice
> 
>> 
>> 
>>>  3. An API that requires the filter up front in the AdvancedCache
>>>  method.  This also brings up the point should we require a filter 
>>> to
>>>  always be provided?  Unfortuantely this doesn't prevent a user from
>>>  querying every entry as they can just use a filter that accepts all
>>>  key/value pairs.
>> 
>>  Why is that unfortunate?
> 
> I was saying along the lines if we wanted to make sure a user doesn't
> query all data.  But if we do want them to do this, then it is great.
> I know some people are on the fence about it.

This was my original proposal. My main concern was finding a good name 
for the entryIterable() method, so I figured we could avoid it 
completely:

     try (CloseableIterable entries = 
cache.filter(filter).convert(converter)) {
        for (CacheEntry entry : ctx.asIterable()) {
           // Do something
        }
     }

I figured it would be good to nudge users toward using a filter instead 
of filtering in the for block, because of the savings in network.
OTOH, even though it doesn't prevent the user from iterating over all 
the entries in the cache, it does make it look a bit awkward.

> 
> 
>> 
>>> 
>>>  on AdvancedCache
>>> 
>>>  EntryIterable<K, V> entryIterable(Filter<? super K, ? super V> 
>>> filter)
>>> 
>>>  where EntryIterable is defined as:
>>> 
>>>  public interface EntryIterable<K, V> extends
>>>  CloseableIterable<CacheEntry<K, V>> {
>>> 
>>>     public <C> CloseableIterable<CacheEntry<K, C>>
>>>  converter(Converter<? super K, ? super V, C> converter);
>>>  }
>>> 
>>>  The usage would be identical to #2 except the filter is always 
>>> provided.
>> 
>>  I wouldn't mandate it, but in case you overload the method it 
>> probably
>>  is a good idea to internally apply an accept-all filter so you have 
>> a
>>  single implementation. Could be a singleton, which also implies an
>>  efficient Externalizer.
> 
> Although for performance I guess it would be better to not provide a
> Filter for the retrieve all case.

I doubt the size of a custom filter would matter much compared to all 
the entries in the cache, the only issue is the ease of use.

> 
> 
>> 
>>> 
>>> 
>>>  Let me know what you guys think or if you have any other 
>>> suggestions.
>>> 
>>>   Thanks,
>>> 
>>>   - Will
>>>  _______________________________________________
>>>  infinispan-dev mailing list
>>>  infinispan-dev at lists.jboss.org
>>>  https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>  _______________________________________________
>>  infinispan-dev mailing list
>>  infinispan-dev at lists.jboss.org
>>  https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20140505/a03859bf/attachment.html