[infinispan-dev] Multi get API

William Burns mudokonman at gmail.com
Tue Mar 24 14:29:46 EDT 2015


On Tue, Mar 24, 2015 at 12:45 PM, William Burns <mudokonman at gmail.com> wrote:
> On Tue, Mar 24, 2015 at 12:14 PM, Tristan Tarrant <ttarrant at redhat.com> wrote:
>> On 24/03/2015 16:22, William Burns wrote:
>>> On Tue, Mar 24, 2015 at 11:06 AM, Radim Vansa <rvansa at redhat.com> wrote:
>>>> On 03/24/2015 03:38 PM, Pedro Ruivo wrote:
>>>>> Hi,
>>>>>
>>>>> comments inline...
>>>>>
>>>>> On 03/24/2015 01:57 PM, Radim Vansa wrote:
>>>>>> On 03/24/2015 02:22 PM, William Burns wrote:
>>>>>>> I am nearing completion of the new multi get command [1], allowing for
>>>>>>> more efficient retrieval of multiple keys at the same time.  Thanks
>>>>>>> again to Radim for a lot of the initial work.
>>>>>>>
>>>>>>> In doing so though I want to make sure I get feedback on how we want
>>>>>>> the API to actually look, since this is much more difficult to change
>>>>>>> down the line.
>>>>>>>
>>>>>>> There are a few things in particular I wanted to discuss.
>>>>>>>
>>>>>>> 1. The actual name of the method.  The main suggestions I have seen
>>>>>>> are getAll or getMany.  I do like the naming of the former, however it
>>>>>>> seems a bit misleading (remember API is getAll(Set) since we are
>>>>>>> really getting a subset.  So I am thinking the possibilities at this
>>>>>>> point are getAllOf, getAllFor or getMany.  I am leaning maybe towards
>>>>>>> the last one (getMany).  I am open to any suggestions though.
>>>>>> Actually Tristan suggested the name multiGet() - I would prefer this one
>>>>>> in the end, and adding multiPut() that would just do putAll (to have the
>>>>>> API symmetric) and deprecate the putAll() method. multiRemove and others
>>>>>> can follow later, and the naming is straightforward.
>>>>>> I would object against getAll() since this sounds like retrieving all
>>>>>> entries from the cache, not just the specified keys.
>>>>>>
>>>>> Why a Set as parameter?
>>>>>
>>>>> If we have a Collection as parameter, it would allow a user to specify
>>>>> easily the keys he want, like multiGet(Arrays.asList(k1, k2, k3)).
>>>> It's Set as the collection should not contain duplicities. However, it's
>>>> a bit questionable whether the collection should be copied for (async)
>>>> internal processing or whether Infinispan takes ownership of this map
>>>> (first option is more safe, second is faster).
>>> I would say it shouldn't be copied, but I could put a comment on the
>>> Javadoc if the Set is modified concurrently while the operation is
>>> running the results returned will be nondeterministic.
>> The JCache contract for this is:
>>
>> Map<K, V> getAll(Set<? extends K> keys);
>
> Ah, yes I had forgotten this was already on JCache, in that case we
> have a clear precedent and full API definition.  I am thinking to be
> consistent using this would be best and later we can look into Stream
> API as was mentioned.

Actually thinking about this more, the JCache API for getAll seems
wrong.  It seems it should be Map<K, V> getAll(Set<?> keys).  Putting
an upper or lower bounds on the type is too restrictive.  By doing
Set<? extends K> I can't pass in a Set<Object> for example.  Looking
at the various Collection classes they do a similar thing with
containsAll, retainAll, removeAll etc.  I am thinking I am going to
have ours be Set<?> then instead.  We will still have compatibility
with JCache as well.

>
>>
>> Nothing is said about whether "keys" should be "safe", and I think that
>> is absolutely fine for us as well (i.e. no copying, if you change the
>> set, you get what you deserve).
>>
>>>
>>>> Radim
>>>>
>>>>>>> 2. What should the return value be for this method.  Currently this
>>>>>>> returns a Map<K, V> which makes sense when we retain these values in
>>>>>>> the local transactional context and is pretty nice and clean for end
>>>>>>> users.
>>>>>>>
>>>>>>> The other idea is to use a streaming API possibly returning an
>>>>>>> Iterable<CacheEntry<K, V>>.  The streaming return value doesn't seem
>>>>>>> as intuitive to me, but opens us up for more efficient usage
>>>>>>> especially when there may be a lot of keys passed in (also results can
>>>>>>> be processed concurrently while retrieval is still occurring).
>>>>>>>
>>>>>>> I would lean towards returning the Map<K, V> value, however the next
>>>>>>> point could change that.
>>>>>> I think that Iterable<CacheEntry<K, V>> is too confusing for the end
>>>>>> user, I would stick to the Map. If you want lazy loading, the Map (and
>>>>>> it's EntrySet) could be made lazy by a flag.
>>> The point wasn't that it was lazy, but rather that we could limit how
>>> many values are maintained on the caller side to prevent excess memory
>>> usage.
>>>
>>>>> my vote is to use the Map<K,V> interface. The Iterable will force the
>>>>> user to fetch the data using the iterator order, while the Map allow him
>>>>> to use any order he wants (i.e. fetch particular keys via map.get(()).
>>>>>
>>>>> BTW, it can be good to wrap the Map in a unmodifiableMap().
>>> I wonder if this really matters.  Each invocation would return its own
>>> map, so if they modified it, it shouldn't matter.  I would make sure
>>> to document that the Map is not backing map in that updates to the
>>> cache or this map are not reflected in the other.
>> Agree on all points. An iterable variant would probably more closely
>> resemble the Java 8 streams, so I wouldn't overcomplicated this specific
>> API.
>>
>>>>>>> 3. Where this method should live.  Right now this method is on the
>>>>>>> BasicCache interface which is a parent of both Cache (embedded) and
>>>>>>> RemoteCache (remote).  Unfortunately for remote clients to be as
>>>>>>> efficient as possible we really need a Streaming API so that we don't
>>>>>>> have multiple copies of data in memory at multiple places at the same
>>>>>>> time.  For that reason I suggest we use the streaminig API for both or
>>>>>>> have a different API for remote vs embedded.
>>>>>>>
>>>>> IMO, we should have different API for embedded and remote.
>>> +1 That is what I was thinking as well that we make the embedded API
>>> nicer to use.  We can try to figure out the specifics of the remote
>>> API later then.
>>>
>>>>> BasicCache does not seems a good place to put it. It can be placed in
>>>>> AdvancedCache.
>>> +1 I was thinking this may be the safest for now and we can always
>>> move it to a higher class later if needed, however we can't move it
>>> down easily.
>> +1 for AdvancedCache for now, for Infinispan 8 Galder and I will discuss
>> interfaces next week.
>>
>> Tristan
>>
>> --
>> Tristan Tarrant
>> Infinispan Lead
>> JBoss, a division of Red Hat
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list