[infinispan-dev] replacing the (FineGrained)AtomicMap with grouping

Sat Sep 21 21:32:46 EDT 2013

On Sat, Sep 21, 2013 at 3:33 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
> On 20 September 2013 23:19, William Burns <mudokonman at gmail.com> wrote:
>> Responses inline
>>
>> Also want to preface this with: If you haven't seen in other mailing
>> list Read Committed is going away as it doesn't work properly in DIST
>> (in fact AM is is really badly bugged with RC with DIST);
>
> We didn't agree on dropping it, in fact I think that would be a big
> mistake considering the overhead of RR when dealing with large entries
> / many entries.

The additional overhead would be on the first read as it wraps the
entry with RR.  This would just require an object allocation with 5
additional references to already stored objects.  Subsequent reads or
writes would have the same cost.  Actually a write then read is
slightly faster with RR since it does the wrapping immediately on the
read.  Large entities would have the least impact as it would scale
with the amount of entries in the cache.

> For example in all Search use cases we really don't need any RR
> guarantee and would be wise to handle each operation in the most
> efficient strategy.
> [Technically it would be awesome to be able to be able to rely on RR
> but it doesn't work as in databases - it doesn't snapshot the version
> of entries not touched yet - so we have to compensate at a higher
> layer..]

The repeatable read implementation is the same as Oracle using multi
versioned entries.  It sounds like what you are talking about is
Serializable which doesn't scale.

>
>> On Fri, Sep 20, 2013 at 9:56 AM, Emmanuel Bernard
>> <emmanuel at hibernate.org> wrote:
>>> I sort of see how we could replace it but we do make use of the FGAM to
>>> represent an Hibernate OGM dehydrated entity with one property per map
>>> entry.
>>> >From what you are describing we would get an alternative solution but
>>> that would mean more memory strain and object creation. That will
>>> negatively impact the Infinispan backend.
>>
>> Object creation overhead shouldn't be that bad, the only real addition
>> is each node would keep a map containing the group name pointing to
>> the keys that are tied to that group (for entrySet etc).  This allows
>> for much better serialization performance, detailed below, since you
>> can optimize only reading the key(s) you care about.
>
> Isn't FGAM also more efficient storage-wise? Technically it's storing
> a single entry so I would expect it to be "better packed" somehow.

It is actually very similar.  Basically it is ICE variants vs
FastCopyHashMap.Entry.  For immortal, groups would be less but other
types would be slightly larger.  But also one thing this buys you that
I didn't think about until you pointed out is using groups you get all
the expiration and eviction benefits as well, which FGAM doesn't
provide (currently at least).

>
>>>
>>> Also I don't remember if we use the key lock but at some point we will.
>>> I imagine a workaround is to lock the id property.
>
> +1
> Also we already include the id property consistently in any other
> query to guarantee ordering of writes, so that would be a simple
> change.
>
>> Yeah just using a separate but shared lock would cover that pretty
>> easily, but also opens to missed locking as it isn't implicit anymore.
>
> For OGM specifically locking shouldn't be a problem.
>
>>> OGM could live with it but it seems the usage is rendered more
>>> complicated and users having the same style of requirements would need
>>> to be more expert (more complex APIs).
>
> +1
>
>>>
>>> Emmanuel
>>
>>
>> On Fri, Sep 20, 2013 at 1:38 PM, Emmanuel Bernard
>> <emmanuel at hibernate.org> wrote:
>>> Well, I have always wanted a way to only read some keys from a FGAM. Sort of like a projection.
>> +1
>>>
>>>> On 20 sept. 2013, at 21:14, Randall Hauch <rhauch at redhat.com> wrote:
>>>>
>>>> IMO, the primary benefit of the FGAM is that you can aggregate your entries into a single entry that is a real aggregate: read the map and you get all the FGAM's entries in one fell swoop. IIUC, your proposal loses this capability for a single read of all aggregate parts. Is that right?
>
> I think Randall nailed it: that was my first thought as well, that's not nice.
> The good news is that I had already opened - a long time ago - a
> feature request for multi-Get: something like "Value[] get(Key...) ",
> if we had such a feature then FGAM would be in a better position to be
> deprecated.
>
>> That is one benefit.  However this also very costly when you are
>> performing any operation on the AtomicMap from a node that doesn't own
>> that value as it will have to retrieve the entire contents on every
>> operation remotely if not owner (only once inside a given batch/tx).
>
> But the proposed alternatives doesn't improve on that either ;-)

It would give you an option at least if you want a subset, but yes if
you want the entire map it would be the same :-)

>
>>
>> The current grouping API doesn't allow for aggregated keys and values,
>> but Mircea is proposing to add the Cache.getGroup method.  In that
>> case you can control what keys you bring back if you want 1 or all for
>> example.
>
> What if I don't know the keys? We're not always able to list them,
> currently I can iterate the keyset from FGAM.

The API would allow that.

>
>>
>>>>
>>>>> On Sep 20, 2013, at 11:49 AM, Mircea Markus <mmarkus at redhat.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Most of the FGAM functionality can be achieved with grouping, by using the FGAM key as a grouping key.
>>>>> The single bit that seems to be missing from grouping to equivalent the functionality of FGAM is obtaining all the entries under a single group. IOW a method like:
>>>>>
>>>>> Map<K,V> groupedKeys = Cache.getGroup(groupingKey, KeyFilter);
>
> looks good!
> And I assume KeyFilter could be implemented as "accept all" to
> degenerate in the FGAM-like semantics?
> Specifically what I'd like to confirm is that KeyFilter doesn't have
> to be able to enumerate the keys, or in other words that I can
> construct one without having a clue about which keys might be stored
> in the group.
>
>>
>> Just to be clear this is only for the purpose of retrieving multiple
>> values, right?  If a user is doing operations on a single key they
>> would still use the existing get, remove, replace, etc methods on
>> Cache right?  They would just do a put based on the key of the "inner
>> map" and they would just have to have a @Group annotation on the key
>> or use a Grouper API?
>>
>> Taking the group aspect a step further I think it would be cool to
>> have a method similar to AdvancedCache.withFlags like
>> AdvancedCache.groupedBy that returns an AdvancedCache that always
>> sends methods to the given node hashed by the provided group.  Would
>> this override @Group and Grouper though?  I think we would still want
>> to do a projection based view with a KeyFilter though, so users don't
>> have to read all the values if they only want a select few.  Would
>> writes to the projection be forwarded to the real cache though?
>
> Would be interesting to explore but it sounds quite fishy to commit
> having the same interface on the group: will all methods of
> AdvancedCache make sense on it? And will they all be making sense in
> future changes to AdvancedCache? Seems unlikely.
> Map is probably a more suited interface. We could call the method
> #createAtomicMap(GroupKey) :-D

Using groups itself already implies you would be using the Cache
interface, since you can just be using put and get etc. for specific
keys.  With the move to groups we would have to store what entries map
to each group.  So it just kinda seemed like a cool addition to also
support methods like entrySet, keySet, values and size since they
would be very easy to compute this way with some simple tweaks to the
commands to only use group data.  Also I think this would make using
Groups much easier as you don't require a @Group or Grouper.

>
> Cheers,
> Sanne
>
>>
>>>>>
>>>>> This can be relatively easily implemented with the same performance as an AtomicMap lookup.
>>>>>
>>>>> Some other differences worth mentioning:
>>>>> - the cache would contain more entries in the grouping API approach. Not sure if this is really a problem though.
>>>>> - in order to assure REPEATABLE_READ, the AM (including values) is brought on the node that reads it (does't apply to FGAM). Not nice.
>>
>> In both AM and FGAM the entire contents of the map are remotely read
>> at the beginning of the operation as I mentioned above.  Really not
>> nice.
>>
>>>>> - people won't be able to lock an entire group (the equivalent of locking a AM key). I don't think this is a critical requirement, and also can be worked around. Or added as a built in function if needed.
>> +1 I personally don't think we need AM as there are ways to emulate it
>> using manual locking.
>>>>>
>>>>> I find the idea of dropping FGAM and only using grouping very tempting:
>>>>> - there is logic duplication between Grouping and (FG)AM (the locality, fine grained locking) that would be removed
>> +1
>>>>> - FGAM and AM semantic is a bit ambiguous in corner cases
>>>>> - having a Cache.getGroup does make sense in a general case
>>>>> - reduce the code base
>> +1
>>>>>
>>>>> What do people think?
>>
>> I think it definitely could use a fresh evaluation.  Actually by using
>> Groups we no longer have to use Deltas, which means that users could
>> use Deltas for their values now as well, which AM and FGAM didn't
>> support before.
>>
>>>>>
>>>>> Cheers,
>>>>> --
>>>>> Mircea Markus
>>>>> Infinispan lead (www.infinispan.org)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev