[infinispan-dev] Read Committed Distributed Cache Concerns
Mircea Markus
mmarkus at redhat.com
Sun Sep 22 11:28:07 EDT 2013
> On 22 Sep 2013, at 13:57, Sanne Grinovero <sanne at infinispan.org> wrote:
>
>> On 22 September 2013 13:22, Mircea Markus <mmarkus at redhat.com> wrote:
>>
>>>> On 21 Sep 2013, at 23:07, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>>
>>>> On 19 September 2013 18:29, Mircea Markus <mmarkus at redhat.com> wrote:
>>>> (Adding Jonathan who knows a thing or two about transactions.)
>>>>
>>>> Given that READ_COMMITTED (RC) is less performant than REPEATABLE_READ (RR)
>>>> I don't see any value in keeping RC around. I don't think users rely on
>>>> exact RC semantics (i.e. if an entry has been committed then an ongoing
>>>> tx requires the most up 2 date value between reads) - that actually
>>>> is not the case with DIST caches as you've mentioned.
>>>
>>> I don't think you can generalize from the specific example William
>>> made;
>>
>> William was reffering to the general case.
>>
>>> there will still be cases in which READ_COMMITTED will be more
>>> efficient than REPEATABLE_READ,
>>
>> Looking at the implementation, (also as described by William) RC is't faster than RR in the general case. Curious why you think it would be though.
>
> William is describing a potential "fix" for the semantics which would
> make it slower than RR, we're arguing that this "fix" is not desired.
+ 1 for the fix not being desirable. The main question here though is: is there any point in keeping around RC given that RR provides the same performance?
> Also, I'm not interested in counting method invocations needed to
> achieve this: I'm just thinking about the theoretical memory
> consumption of RR
Theory aside, the way RR is implemented, it shouldn't cosume more meomery than RC in the general case (TBC by running benchmarks).
> , which I'd consider more critical.
>
>>> especially if you avoid "fixing" this, as suggested by Radim and
>>> myself in the two previous emails [not sure you if saw them, since you
>>> forking the conversation ignoring those mails]:
>>> if we agree that the current semantics is acceptable, it will
>>> consistently be faster than REPEATABLE_READ.
>>
>> Radim's suggestion was to drop RC after running some tests to validate that RR provides the same performance. You +1 that so I don't understand why you say the conversation was forked.
>
> By "forking" I meant that you asked Jonathan's opinion but without
> including our response, so forking the conversation in two parallel
> discussions. I assume that was unintentional, but looked like you
> might not have seen our responses yet at time of writing yours.
> Also, I did "+1" a full paragraph of Radim's comments, not just his
> last sentence. Personally I find the initial part more important, so
> I'll quote it again:
>
> ~ On 19 September 2013 09:06, Radim Vansa <rvansa at redhat.com> wrote:
> ~ I think that Read Committed isolation level is not obliged to present
> ~ you with up-to-date committed data - the only fact is that it can, but
> ~ application must not rely on that. It's lower isolation level.
> ~ Nevertheless, I think that lower isolation level should mean better
> ~ performance. I would be strongly against imposing any additional
> ~ overhead that could slow it down [...]
>
> +1 for this. As mentioned above, if you're storing data blocks of
> non-irrelevant size, and my code is happy reading an older version
> even in the same transaction, I don't wish to incur in the performance
> penalty imposed by RR.
>
> --Sanne
>
>>
>>>
>>> Sanne
>>>
>>>> I think RC is only preferred to RR because of performance, but if the performance
>>>> is the same (or even worse) I think we should only provide RR. Jonathan, care to comment?
>>>>
>>>>
>>>>> On Sep 18, 2013, at 11:03 PM, William Burns <mudokonman at gmail.com> wrote:
>>>>>
>>>>> I was recently refactoring code dealing with isolation levels and
>>>>> found how ReadCommitted is implemented and I have a few concerns I
>>>>> wanted to bring up.
>>>>>
>>>>> ReadCommitted read operations work by storing a reference to the value
>>>>> from the data store in its caller's context. Thus whenever another
>>>>> transaction is committed that updates the data store value any context
>>>>> that has that reference now sees the latest committed value. This
>>>>> works well for Local and Replicated caches since all data stores are
>>>>> updated with the latest value upon completion of the transaction.
>>>>> However Distributed caches only the owners see the update in their
>>>>> data store and thus any non owner will still have the old value they
>>>>> previously read before the commit occurred.
>>>>>
>>>>> This seems quite inconsistent that Distributed caches run in a mix of
>>>>> Repeatable Read/Read Committed depending on what node and what key you
>>>>> are using.
>>>>>
>>>>> To operate properly we could track requests similar to how it works
>>>>> for L1 so we can tell non owners to clear out their context values for
>>>>> values they read remotely that they haven't updated (since Read
>>>>> Committed writes should return the same written value). That seems
>>>>> like quite a bit of additional overhead though.
>>>>>
>>>>> I am wondering is it worth it to try to keep Read Committed isolation
>>>>> level though? It seems that Repeatable Read would be simpler and most
>>>>> likely more performant as you wouldn't need all the additional remote
>>>>> calls to get it to work properly. Or is it okay that we have
>>>>> different isolation levels for some keys on some nodes? This could be
>>>>> quite confusing if a user was using a local and remote transaction and
>>>>> a transaction may not see the other's committed changes when they
>>>>> expect to.
>>>>>
>>>>> What do you guys think?
>>>>>
>>>>> - Will
>>>>>
>>>>> P.S.
>>>>>
>>>>> I also found a bug with Read Committed for all caches where if you do
>>>>> a write that changes the underlying InternalCacheEntry to a new type,
>>>>> that reads won't see subsequent committed values. This is caused
>>>>> because the underlying data is changed to a new reference and a read
>>>>> would still be holding onto a reference of the old InternalCacheEntry.
>>>>> This can happen when using the various overridden put methods for
>>>>> example. We should have a good solution for it, but may not be
>>>>> required if we find that Read Committed itself is flawed beyond
>>>>> saving.
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> Cheers,
>>>> --
>>>> Mircea Markus
>>>> Infinispan lead (www.infinispan.org)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
More information about the infinispan-dev
mailing list