[infinispan-dev] Multi get API

Tue Mar 24 12:14:33 EDT 2015

On 24/03/2015 16:22, William Burns wrote:
> On Tue, Mar 24, 2015 at 11:06 AM, Radim Vansa <rvansa at redhat.com> wrote:
>> On 03/24/2015 03:38 PM, Pedro Ruivo wrote:
>>> Hi,
>>>
>>> comments inline...
>>>
>>> On 03/24/2015 01:57 PM, Radim Vansa wrote:
>>>> On 03/24/2015 02:22 PM, William Burns wrote:
>>>>> I am nearing completion of the new multi get command [1], allowing for
>>>>> more efficient retrieval of multiple keys at the same time.  Thanks
>>>>> again to Radim for a lot of the initial work.
>>>>>
>>>>> In doing so though I want to make sure I get feedback on how we want
>>>>> the API to actually look, since this is much more difficult to change
>>>>> down the line.
>>>>>
>>>>> There are a few things in particular I wanted to discuss.
>>>>>
>>>>> 1. The actual name of the method.  The main suggestions I have seen
>>>>> are getAll or getMany.  I do like the naming of the former, however it
>>>>> seems a bit misleading (remember API is getAll(Set) since we are
>>>>> really getting a subset.  So I am thinking the possibilities at this
>>>>> point are getAllOf, getAllFor or getMany.  I am leaning maybe towards
>>>>> the last one (getMany).  I am open to any suggestions though.
>>>> Actually Tristan suggested the name multiGet() - I would prefer this one
>>>> in the end, and adding multiPut() that would just do putAll (to have the
>>>> API symmetric) and deprecate the putAll() method. multiRemove and others
>>>> can follow later, and the naming is straightforward.
>>>> I would object against getAll() since this sounds like retrieving all
>>>> entries from the cache, not just the specified keys.
>>>>
>>> Why a Set as parameter?
>>>
>>> If we have a Collection as parameter, it would allow a user to specify
>>> easily the keys he want, like multiGet(Arrays.asList(k1, k2, k3)).
>> It's Set as the collection should not contain duplicities. However, it's
>> a bit questionable whether the collection should be copied for (async)
>> internal processing or whether Infinispan takes ownership of this map
>> (first option is more safe, second is faster).
> I would say it shouldn't be copied, but I could put a comment on the
> Javadoc if the Set is modified concurrently while the operation is
> running the results returned will be nondeterministic.
The JCache contract for this is:

Map<K, V> getAll(Set<? extends K> keys);

Nothing is said about whether "keys" should be "safe", and I think that 
is absolutely fine for us as well (i.e. no copying, if you change the 
set, you get what you deserve).

>
>> Radim
>>
>>>>> 2. What should the return value be for this method.  Currently this
>>>>> returns a Map<K, V> which makes sense when we retain these values in
>>>>> the local transactional context and is pretty nice and clean for end
>>>>> users.
>>>>>
>>>>> The other idea is to use a streaming API possibly returning an
>>>>> Iterable<CacheEntry<K, V>>.  The streaming return value doesn't seem
>>>>> as intuitive to me, but opens us up for more efficient usage
>>>>> especially when there may be a lot of keys passed in (also results can
>>>>> be processed concurrently while retrieval is still occurring).
>>>>>
>>>>> I would lean towards returning the Map<K, V> value, however the next
>>>>> point could change that.
>>>> I think that Iterable<CacheEntry<K, V>> is too confusing for the end
>>>> user, I would stick to the Map. If you want lazy loading, the Map (and
>>>> it's EntrySet) could be made lazy by a flag.
> The point wasn't that it was lazy, but rather that we could limit how
> many values are maintained on the caller side to prevent excess memory
> usage.
>
>>> my vote is to use the Map<K,V> interface. The Iterable will force the
>>> user to fetch the data using the iterator order, while the Map allow him
>>> to use any order he wants (i.e. fetch particular keys via map.get(()).
>>>
>>> BTW, it can be good to wrap the Map in a unmodifiableMap().
> I wonder if this really matters.  Each invocation would return its own
> map, so if they modified it, it shouldn't matter.  I would make sure
> to document that the Map is not backing map in that updates to the
> cache or this map are not reflected in the other.
Agree on all points. An iterable variant would probably more closely 
resemble the Java 8 streams, so I wouldn't overcomplicated this specific 
API.

>>>>> 3. Where this method should live.  Right now this method is on the
>>>>> BasicCache interface which is a parent of both Cache (embedded) and
>>>>> RemoteCache (remote).  Unfortunately for remote clients to be as
>>>>> efficient as possible we really need a Streaming API so that we don't
>>>>> have multiple copies of data in memory at multiple places at the same
>>>>> time.  For that reason I suggest we use the streaminig API for both or
>>>>> have a different API for remote vs embedded.
>>>>>
>>> IMO, we should have different API for embedded and remote.
> +1 That is what I was thinking as well that we make the embedded API
> nicer to use.  We can try to figure out the specifics of the remote
> API later then.
>
>>> BasicCache does not seems a good place to put it. It can be placed in
>>> AdvancedCache.
> +1 I was thinking this may be the safest for now and we can always
> move it to a higher class later if needed, however we can't move it
> down easily.
+1 for AdvancedCache for now, for Infinispan 8 Galder and I will discuss 
interfaces next week.

Tristan

-- 
Tristan Tarrant
Infinispan Lead
JBoss, a division of Red Hat