[infinispan-dev] replacing the (FineGrained)AtomicMap with grouping
Mircea Markus
mmarkus at redhat.com
Sun Sep 22 08:39:22 EDT 2013
> On 22 Sep 2013, at 02:33, William Burns <mudokonman at gmail.com> wrote:
>
>> On Sat, Sep 21, 2013 at 3:33 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>> On 20 September 2013 23:19, William Burns <mudokonman at gmail.com> wrote:
>>> Responses inline
>>>
>>> Also want to preface this with: If you haven't seen in other mailing
>>> list Read Committed is going away as it doesn't work properly in DIST
>>> (in fact AM is is really badly bugged with RC with DIST);
>>
>> We didn't agree on dropping it, in fact I think that would be a big
>> mistake considering the overhead of RR when dealing with large entries
>> / many entries.
I don't see the see the overhead you're talking about, please elaborate.
>
> The additional overhead would be on the first read as it wraps the
> entry with RR. This would just require an object allocation with 5
> additional references to already stored objects. Subsequent reads or
> writes would have the same cost. Actually a write then read is
> slightly faster with RR since it does the wrapping immediately on the
> read. Large entities would have the least impact as it would scale
> with the amount of entries in the cache.
Yep, there's no real overhead in RR vs RC. Of course we should validate that by benchmarking.
>
>> For example in all Search use cases we really don't need any RR
>> guarantee and would be wise to handle each operation in the most
>> efficient strategy.
>> [Technically it would be awesome to be able to be able to rely on RR
>> but it doesn't work as in databases - it doesn't snapshot the version
>> of entries not touched yet - so we have to compensate at a higher
>> layer..]
>
> The repeatable read implementation is the same as Oracle using multi
> versioned entries. It sounds like what you are talking about is
> Serializable which doesn't scale.
>
>>
>>> On Fri, Sep 20, 2013 at 9:56 AM, Emmanuel Bernard
>>> <emmanuel at hibernate.org> wrote:
>>>> I sort of see how we could replace it but we do make use of the FGAM to
>>>> represent an Hibernate OGM dehydrated entity with one property per map
>>>> entry.
>>>>> From what you are describing we would get an alternative solution but
>>>> that would mean more memory strain and object creation. That will
>>>> negatively impact the Infinispan backend.
>>>
>>> Object creation overhead shouldn't be that bad, the only real addition
>>> is each node would keep a map containing the group name pointing to
>>> the keys that are tied to that group (for entrySet etc). This allows
>>> for much better serialization performance, detailed below, since you
>>> can optimize only reading the key(s) you care about.
>>
>> Isn't FGAM also more efficient storage-wise? Technically it's storing
>> a single entry so I would expect it to be "better packed" somehow.
>
> It is actually very similar. Basically it is ICE variants vs
> FastCopyHashMap.Entry. For immortal, groups would be less but other
> types would be slightly larger. But also one thing this buys you that
> I didn't think about until you pointed out is using groups you get all
> the expiration and eviction benefits as well, which FGAM doesn't
> provide (currently at least).
>
>>
>>>>
>>>> Also I don't remember if we use the key lock but at some point we will.
>>>> I imagine a workaround is to lock the id property.
>>
>> +1
>> Also we already include the id property consistently in any other
>> query to guarantee ordering of writes, so that would be a simple
>> change.
>>
>>> Yeah just using a separate but shared lock would cover that pretty
>>> easily, but also opens to missed locking as it isn't implicit anymore.
>>
>> For OGM specifically locking shouldn't be a problem.
>>
>>>> OGM could live with it but it seems the usage is rendered more
>>>> complicated and users having the same style of requirements would need
>>>> to be more expert (more complex APIs).
>>
>> +1
>>
>>>>
>>>> Emmanuel
>>>
>>>
>>> On Fri, Sep 20, 2013 at 1:38 PM, Emmanuel Bernard
>>> <emmanuel at hibernate.org> wrote:
>>>> Well, I have always wanted a way to only read some keys from a FGAM. Sort of like a projection.
>>> +1
>>>>
>>>>> On 20 sept. 2013, at 21:14, Randall Hauch <rhauch at redhat.com> wrote:
>>>>>
>>>>> IMO, the primary benefit of the FGAM is that you can aggregate your entries into a single entry that is a real aggregate: read the map and you get all the FGAM's entries in one fell swoop. IIUC, your proposal loses this capability for a single read of all aggregate parts. Is that right?
>>
>> I think Randall nailed it: that was my first thought as well, that's not nice.
>> The good news is that I had already opened - a long time ago - a
>> feature request for multi-Get: something like "Value[] get(Key...) ",
>> if we had such a feature then FGAM would be in a better position to be
>> deprecated.
>>
>>> That is one benefit. However this also very costly when you are
>>> performing any operation on the AtomicMap from a node that doesn't own
>>> that value as it will have to retrieve the entire contents on every
>>> operation remotely if not owner (only once inside a given batch/tx).
>>
>> But the proposed alternatives doesn't improve on that either ;-)
>
> It would give you an option at least if you want a subset, but yes if
> you want the entire map it would be the same :-)
>
>>
>>>
>>> The current grouping API doesn't allow for aggregated keys and values,
>>> but Mircea is proposing to add the Cache.getGroup method. In that
>>> case you can control what keys you bring back if you want 1 or all for
>>> example.
>>
>> What if I don't know the keys? We're not always able to list them,
>> currently I can iterate the keyset from FGAM.
>
> The API would allow that.
>
>>
>>>
>>>>>
>>>>>> On Sep 20, 2013, at 11:49 AM, Mircea Markus <mmarkus at redhat.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Most of the FGAM functionality can be achieved with grouping, by using the FGAM key as a grouping key.
>>>>>> The single bit that seems to be missing from grouping to equivalent the functionality of FGAM is obtaining all the entries under a single group. IOW a method like:
>>>>>>
>>>>>> Map<K,V> groupedKeys = Cache.getGroup(groupingKey, KeyFilter);
>>
>> looks good!
>> And I assume KeyFilter could be implemented as "accept all" to
>> degenerate in the FGAM-like semantics?
>> Specifically what I'd like to confirm is that KeyFilter doesn't have
>> to be able to enumerate the keys, or in other words that I can
>> construct one without having a clue about which keys might be stored
>> in the group.
>>
>>>
>>> Just to be clear this is only for the purpose of retrieving multiple
>>> values, right? If a user is doing operations on a single key they
>>> would still use the existing get, remove, replace, etc methods on
>>> Cache right? They would just do a put based on the key of the "inner
>>> map" and they would just have to have a @Group annotation on the key
>>> or use a Grouper API?
>>>
>>> Taking the group aspect a step further I think it would be cool to
>>> have a method similar to AdvancedCache.withFlags like
>>> AdvancedCache.groupedBy that returns an AdvancedCache that always
>>> sends methods to the given node hashed by the provided group. Would
>>> this override @Group and Grouper though? I think we would still want
>>> to do a projection based view with a KeyFilter though, so users don't
>>> have to read all the values if they only want a select few. Would
>>> writes to the projection be forwarded to the real cache though?
>>
>> Would be interesting to explore but it sounds quite fishy to commit
>> having the same interface on the group: will all methods of
>> AdvancedCache make sense on it? And will they all be making sense in
>> future changes to AdvancedCache? Seems unlikely.
>> Map is probably a more suited interface. We could call the method
>> #createAtomicMap(GroupKey) :-D
>
> Using groups itself already implies you would be using the Cache
> interface, since you can just be using put and get etc. for specific
> keys. With the move to groups we would have to store what entries map
> to each group. So it just kinda seemed like a cool addition to also
> support methods like entrySet, keySet, values and size since they
> would be very easy to compute this way with some simple tweaks to the
> commands to only use group data. Also I think this would make using
> Groups much easier as you don't require a @Group or Grouper.
>
>>
>> Cheers,
>> Sanne
>>
>>>
>>>>>>
>>>>>> This can be relatively easily implemented with the same performance as an AtomicMap lookup.
>>>>>>
>>>>>> Some other differences worth mentioning:
>>>>>> - the cache would contain more entries in the grouping API approach. Not sure if this is really a problem though.
>>>>>> - in order to assure REPEATABLE_READ, the AM (including values) is brought on the node that reads it (does't apply to FGAM). Not nice.
>>>
>>> In both AM and FGAM the entire contents of the map are remotely read
>>> at the beginning of the operation as I mentioned above. Really not
>>> nice.
>>>
>>>>>> - people won't be able to lock an entire group (the equivalent of locking a AM key). I don't think this is a critical requirement, and also can be worked around. Or added as a built in function if needed.
>>> +1 I personally don't think we need AM as there are ways to emulate it
>>> using manual locking.
>>>>>>
>>>>>> I find the idea of dropping FGAM and only using grouping very tempting:
>>>>>> - there is logic duplication between Grouping and (FG)AM (the locality, fine grained locking) that would be removed
>>> +1
>>>>>> - FGAM and AM semantic is a bit ambiguous in corner cases
>>>>>> - having a Cache.getGroup does make sense in a general case
>>>>>> - reduce the code base
>>> +1
>>>>>>
>>>>>> What do people think?
>>>
>>> I think it definitely could use a fresh evaluation. Actually by using
>>> Groups we no longer have to use Deltas, which means that users could
>>> use Deltas for their values now as well, which AM and FGAM didn't
>>> support before.
>>>
>>>>>>
>>>>>> Cheers,
>>>>>> --
>>>>>> Mircea Markus
>>>>>> Infinispan lead (www.infinispan.org)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
More information about the infinispan-dev
mailing list