[infinispan-dev] Multi get API

Tue Mar 24 11:22:52 EDT 2015

On Tue, Mar 24, 2015 at 11:06 AM, Radim Vansa <rvansa at redhat.com> wrote:
> On 03/24/2015 03:38 PM, Pedro Ruivo wrote:
>> Hi,
>>
>> comments inline...
>>
>> On 03/24/2015 01:57 PM, Radim Vansa wrote:
>>> On 03/24/2015 02:22 PM, William Burns wrote:
>>>> I am nearing completion of the new multi get command [1], allowing for
>>>> more efficient retrieval of multiple keys at the same time.  Thanks
>>>> again to Radim for a lot of the initial work.
>>>>
>>>> In doing so though I want to make sure I get feedback on how we want
>>>> the API to actually look, since this is much more difficult to change
>>>> down the line.
>>>>
>>>> There are a few things in particular I wanted to discuss.
>>>>
>>>> 1. The actual name of the method.  The main suggestions I have seen
>>>> are getAll or getMany.  I do like the naming of the former, however it
>>>> seems a bit misleading (remember API is getAll(Set) since we are
>>>> really getting a subset.  So I am thinking the possibilities at this
>>>> point are getAllOf, getAllFor or getMany.  I am leaning maybe towards
>>>> the last one (getMany).  I am open to any suggestions though.
>>> Actually Tristan suggested the name multiGet() - I would prefer this one
>>> in the end, and adding multiPut() that would just do putAll (to have the
>>> API symmetric) and deprecate the putAll() method. multiRemove and others
>>> can follow later, and the naming is straightforward.
>>> I would object against getAll() since this sounds like retrieving all
>>> entries from the cache, not just the specified keys.
>>>
>> Why a Set as parameter?
>>
>> If we have a Collection as parameter, it would allow a user to specify
>> easily the keys he want, like multiGet(Arrays.asList(k1, k2, k3)).
>
> It's Set as the collection should not contain duplicities. However, it's
> a bit questionable whether the collection should be copied for (async)
> internal processing or whether Infinispan takes ownership of this map
> (first option is more safe, second is faster).

I would say it shouldn't be copied, but I could put a comment on the
Javadoc if the Set is modified concurrently while the operation is
running the results returned will be nondeterministic.

>
> Radim
>
>>
>>>> 2. What should the return value be for this method.  Currently this
>>>> returns a Map<K, V> which makes sense when we retain these values in
>>>> the local transactional context and is pretty nice and clean for end
>>>> users.
>>>>
>>>> The other idea is to use a streaming API possibly returning an
>>>> Iterable<CacheEntry<K, V>>.  The streaming return value doesn't seem
>>>> as intuitive to me, but opens us up for more efficient usage
>>>> especially when there may be a lot of keys passed in (also results can
>>>> be processed concurrently while retrieval is still occurring).
>>>>
>>>> I would lean towards returning the Map<K, V> value, however the next
>>>> point could change that.
>>> I think that Iterable<CacheEntry<K, V>> is too confusing for the end
>>> user, I would stick to the Map. If you want lazy loading, the Map (and
>>> it's EntrySet) could be made lazy by a flag.

The point wasn't that it was lazy, but rather that we could limit how
many values are maintained on the caller side to prevent excess memory
usage.

>> my vote is to use the Map<K,V> interface. The Iterable will force the
>> user to fetch the data using the iterator order, while the Map allow him
>> to use any order he wants (i.e. fetch particular keys via map.get(()).
>>
>> BTW, it can be good to wrap the Map in a unmodifiableMap().

I wonder if this really matters.  Each invocation would return its own
map, so if they modified it, it shouldn't matter.  I would make sure
to document that the Map is not backing map in that updates to the
cache or this map are not reflected in the other.

>>
>>>> 3. Where this method should live.  Right now this method is on the
>>>> BasicCache interface which is a parent of both Cache (embedded) and
>>>> RemoteCache (remote).  Unfortunately for remote clients to be as
>>>> efficient as possible we really need a Streaming API so that we don't
>>>> have multiple copies of data in memory at multiple places at the same
>>>> time.  For that reason I suggest we use the streaminig API for both or
>>>> have a different API for remote vs embedded.
>>>>
>> IMO, we should have different API for embedded and remote.

+1 That is what I was thinking as well that we make the embedded API
nicer to use.  We can try to figure out the specifics of the remote
API later then.

>>
>> BasicCache does not seems a good place to put it. It can be placed in
>> AdvancedCache.

+1 I was thinking this may be the safest for now and we can always
move it to a higher class later if needed, however we can't move it
down easily.

>>
>> Cheers,
>> Pedro
>>
>>> Radim
>>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Radim Vansa <rvansa at redhat.com>
> JBoss Performance Team
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev