Re: [infinispan-dev] Multi get API

Tuesday, 24 March 2015

On Tue, Mar 24, 2015 at 11:06 AM, Radim Vansa <rvansa(a)redhat.com&gt; wrote:
...
 On 03/24/2015 03:38 PM, Pedro Ruivo wrote:
> Hi,
>
> comments inline...
>
> On 03/24/2015 01:57 PM, Radim Vansa wrote:
>> On 03/24/2015 02:22 PM, William Burns wrote:
>>> I am nearing completion of the new multi get command [1], allowing for
>>> more efficient retrieval of multiple keys at the same time.  Thanks
>>> again to Radim for a lot of the initial work.
>>>
>>> In doing so though I want to make sure I get feedback on how we want
>>> the API to actually look, since this is much more difficult to change
>>> down the line.
>>>
>>> There are a few things in particular I wanted to discuss.
>>>
>>> 1. The actual name of the method.  The main suggestions I have seen
>>> are getAll or getMany.  I do like the naming of the former, however it
>>> seems a bit misleading (remember API is getAll(Set) since we are
>>> really getting a subset.  So I am thinking the possibilities at this
>>> point are getAllOf, getAllFor or getMany.  I am leaning maybe towards
>>> the last one (getMany).  I am open to any suggestions though.
>> Actually Tristan suggested the name multiGet() - I would prefer this one
>> in the end, and adding multiPut() that would just do putAll (to have the
>> API symmetric) and deprecate the putAll() method. multiRemove and others
>> can follow later, and the naming is straightforward.
>> I would object against getAll() since this sounds like retrieving all
>> entries from the cache, not just the specified keys.
>>
> Why a Set as parameter?
>
> If we have a Collection as parameter, it would allow a user to specify
> easily the keys he want, like multiGet(Arrays.asList(k1, k2, k3)).

 It's Set as the collection should not contain duplicities. However, it's
 a bit questionable whether the collection should be copied for (async)
 internal processing or whether Infinispan takes ownership of this map
 (first option is more safe, second is faster). 
I would say it shouldn't be copied, but I could put a comment on the
Javadoc if the Set is modified concurrently while the operation is
running the results returned will be nondeterministic.

...

 Radim

>
>>> 2. What should the return value be for this method.  Currently this
>>> returns a Map<K, V> which makes sense when we retain these values in
>>> the local transactional context and is pretty nice and clean for end
>>> users.
>>>
>>> The other idea is to use a streaming API possibly returning an
>>> Iterable<CacheEntry<K, V>>.  The streaming return value
doesn't seem
>>> as intuitive to me, but opens us up for more efficient usage
>>> especially when there may be a lot of keys passed in (also results can
>>> be processed concurrently while retrieval is still occurring).
>>>
>>> I would lean towards returning the Map<K, V> value, however the next
>>> point could change that.
>> I think that Iterable<CacheEntry<K, V>> is too confusing for the end
>> user, I would stick to the Map. If you want lazy loading, the Map (and
>> it's EntrySet) could be made lazy by a flag. 
The point wasn't that it was lazy, but rather that we could limit how
many values are maintained on the caller side to prevent excess memory
usage.

...
> my vote is to use the Map<K,V> interface. The Iterable will
force the
> user to fetch the data using the iterator order, while the Map allow him
> to use any order he wants (i.e. fetch particular keys via map.get(()).
>
> BTW, it can be good to wrap the Map in a unmodifiableMap(). 
I wonder if this really matters.  Each invocation would return its own
map, so if they modified it, it shouldn't matter.  I would make sure
to document that the Map is not backing map in that updates to the
cache or this map are not reflected in the other.

...
>
>>> 3. Where this method should live.  Right now this method is on the
>>> BasicCache interface which is a parent of both Cache (embedded) and
>>> RemoteCache (remote).  Unfortunately for remote clients to be as
>>> efficient as possible we really need a Streaming API so that we don't
>>> have multiple copies of data in memory at multiple places at the same
>>> time.  For that reason I suggest we use the streaminig API for both or
>>> have a different API for remote vs embedded.
>>>
> IMO, we should have different API for embedded and remote. 
+1 That is what I was thinking as well that we make the embedded API
nicer to use.  We can try to figure out the specifics of the remote
API later then.

...
>
> BasicCache does not seems a good place to put it. It can be placed in
> AdvancedCache. 
+1 I was thinking this may be the safest for now and we can always
move it to a higher class later if needed, however we can't move it
down easily.

...
>
> Cheers,
> Pedro
>
>> Radim
>>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Radim Vansa <rvansa(a)redhat.com&gt;
 JBoss Performance Team

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Multi get API