On Tue, Mar 24, 2015 at 11:06 AM, Radim Vansa <rvansa(a)redhat.com> wrote:
On 03/24/2015 03:38 PM, Pedro Ruivo wrote:
> Hi,
>
> comments inline...
>
> On 03/24/2015 01:57 PM, Radim Vansa wrote:
>> On 03/24/2015 02:22 PM, William Burns wrote:
>>> I am nearing completion of the new multi get command [1], allowing for
>>> more efficient retrieval of multiple keys at the same time. Thanks
>>> again to Radim for a lot of the initial work.
>>>
>>> In doing so though I want to make sure I get feedback on how we want
>>> the API to actually look, since this is much more difficult to change
>>> down the line.
>>>
>>> There are a few things in particular I wanted to discuss.
>>>
>>> 1. The actual name of the method. The main suggestions I have seen
>>> are getAll or getMany. I do like the naming of the former, however it
>>> seems a bit misleading (remember API is getAll(Set) since we are
>>> really getting a subset. So I am thinking the possibilities at this
>>> point are getAllOf, getAllFor or getMany. I am leaning maybe towards
>>> the last one (getMany). I am open to any suggestions though.
>> Actually Tristan suggested the name multiGet() - I would prefer this one
>> in the end, and adding multiPut() that would just do putAll (to have the
>> API symmetric) and deprecate the putAll() method. multiRemove and others
>> can follow later, and the naming is straightforward.
>> I would object against getAll() since this sounds like retrieving all
>> entries from the cache, not just the specified keys.
>>
> Why a Set as parameter?
>
> If we have a Collection as parameter, it would allow a user to specify
> easily the keys he want, like multiGet(Arrays.asList(k1, k2, k3)).
It's Set as the collection should not contain duplicities. However, it's
a bit questionable whether the collection should be copied for (async)
internal processing or whether Infinispan takes ownership of this map
(first option is more safe, second is faster).
I would say it shouldn't be copied, but I could put a comment on the
Javadoc if the Set is modified concurrently while the operation is
running the results returned will be nondeterministic.
Radim
>
>>> 2. What should the return value be for this method. Currently this
>>> returns a Map<K, V> which makes sense when we retain these values in
>>> the local transactional context and is pretty nice and clean for end
>>> users.
>>>
>>> The other idea is to use a streaming API possibly returning an
>>> Iterable<CacheEntry<K, V>>. The streaming return value
doesn't seem
>>> as intuitive to me, but opens us up for more efficient usage
>>> especially when there may be a lot of keys passed in (also results can
>>> be processed concurrently while retrieval is still occurring).
>>>
>>> I would lean towards returning the Map<K, V> value, however the next
>>> point could change that.
>> I think that Iterable<CacheEntry<K, V>> is too confusing for the end
>> user, I would stick to the Map. If you want lazy loading, the Map (and
>> it's EntrySet) could be made lazy by a flag.
The point wasn't that it was lazy, but rather that we could limit how
many values are maintained on the caller side to prevent excess memory
usage.
> my vote is to use the Map<K,V> interface. The Iterable will
force the
> user to fetch the data using the iterator order, while the Map allow him
> to use any order he wants (i.e. fetch particular keys via map.get(()).
>
> BTW, it can be good to wrap the Map in a unmodifiableMap().
I wonder if this really matters. Each invocation would return its own
map, so if they modified it, it shouldn't matter. I would make sure
to document that the Map is not backing map in that updates to the
cache or this map are not reflected in the other.
>
>>> 3. Where this method should live. Right now this method is on the
>>> BasicCache interface which is a parent of both Cache (embedded) and
>>> RemoteCache (remote). Unfortunately for remote clients to be as
>>> efficient as possible we really need a Streaming API so that we don't
>>> have multiple copies of data in memory at multiple places at the same
>>> time. For that reason I suggest we use the streaminig API for both or
>>> have a different API for remote vs embedded.
>>>
> IMO, we should have different API for embedded and remote.
+1 That is what I was thinking as well that we make the embedded API
nicer to use. We can try to figure out the specifics of the remote
API later then.
>
> BasicCache does not seems a good place to put it. It can be placed in
> AdvancedCache.
+1 I was thinking this may be the safest for now and we can always
move it to a higher class later if needed, however we can't move it
down easily.
>
> Cheers,
> Pedro
>
>> Radim
>>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Radim Vansa <rvansa(a)redhat.com>
JBoss Performance Team
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev