[infinispan-dev] Read Committed Distributed Cache Concerns

Sun Sep 22 11:28:07 EDT 2013

> On 22 Sep 2013, at 13:57, Sanne Grinovero <sanne at infinispan.org> wrote:
> 
>> On 22 September 2013 13:22, Mircea Markus <mmarkus at redhat.com> wrote:
>> 
>>>> On 21 Sep 2013, at 23:07, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>> 
>>>> On 19 September 2013 18:29, Mircea Markus <mmarkus at redhat.com> wrote:
>>>> (Adding Jonathan who knows a thing or two about transactions.)
>>>> 
>>>> Given that READ_COMMITTED (RC) is less performant than REPEATABLE_READ (RR)
>>>> I don't see any value in keeping RC around. I don't think users rely on
>>>> exact RC semantics (i.e. if an entry has been committed then an ongoing
>>>> tx requires the most up 2 date value between reads) - that actually
>>>> is not the case with DIST caches as you've mentioned.
>>> 
>>> I don't think you can generalize from the specific example William
>>> made;
>> 
>> William was reffering to the general case.
>> 
>>> there will still be cases in which READ_COMMITTED will be more
>>> efficient than REPEATABLE_READ,
>> 
>> Looking at the implementation, (also as described by William) RC is't faster than RR in the general case. Curious why you think it would be though.
> 
> William is describing a potential "fix" for the semantics which would
> make it slower than RR, we're arguing that this "fix" is not desired.

+ 1 for the fix not being desirable. The main question here though is: is there any point in keeping around RC given that RR provides the same performance?

> Also, I'm not interested in counting method invocations needed to
> achieve this: I'm just thinking about the theoretical memory
> consumption of RR

Theory aside, the way RR is implemented, it shouldn't cosume more meomery than RC in the general case (TBC by running benchmarks). 

> , which I'd consider more critical.
> 
>>> especially if you avoid "fixing" this, as suggested by Radim and
>>> myself in the two previous emails [not sure you if saw them, since you
>>> forking the conversation ignoring those mails]:
>>> if we agree that the current semantics is acceptable, it will
>>> consistently be faster than REPEATABLE_READ.
>> 
>> Radim's suggestion was to drop RC after running some tests to validate that RR provides the same performance. You +1 that so I don't understand why you say the conversation was forked.
> 
> By "forking" I meant that you asked Jonathan's opinion but without
> including our response, so forking the conversation in two parallel
> discussions. I assume that was unintentional, but looked like you
> might not have seen our responses yet at time of writing yours.
> Also, I did "+1" a full paragraph of Radim's comments, not just his
> last sentence. Personally I find the initial part more important, so
> I'll quote it again:
> 
> ~   On 19 September 2013 09:06, Radim Vansa <rvansa at redhat.com> wrote:
> ~   I think that Read Committed isolation level is not obliged to present
> ~   you with up-to-date committed data - the only fact is that it can, but
> ~   application must not rely on that. It's lower isolation level.
> ~   Nevertheless, I think that lower isolation level should mean better
> ~   performance. I would be strongly against imposing any additional
> ~   overhead that could slow it down [...]
> 
> +1 for this. As mentioned above, if you're storing data blocks of
> non-irrelevant size, and my code is happy reading an older version
> even in the same transaction, I don't wish to incur in the performance
> penalty imposed by RR.
> 
> --Sanne
> 
>> 
>>> 
>>> Sanne
>>> 
>>>> I think RC is only preferred to RR because of performance, but if the performance
>>>> is the same (or even worse) I think we should only provide RR. Jonathan, care to comment?
>>>> 
>>>> 
>>>>> On Sep 18, 2013, at 11:03 PM, William Burns <mudokonman at gmail.com> wrote:
>>>>> 
>>>>> I was recently refactoring code dealing with isolation levels and
>>>>> found how ReadCommitted is implemented and I have a few concerns I
>>>>> wanted to bring up.
>>>>> 
>>>>> ReadCommitted read operations work by storing a reference to the value
>>>>> from the data store in its caller's context.  Thus whenever another
>>>>> transaction is committed that updates the data store value any context
>>>>> that has that reference now sees the latest committed value.  This
>>>>> works well for Local and Replicated caches since all data stores are
>>>>> updated with the latest value upon completion of the transaction.
>>>>> However Distributed caches only the owners see the update in their
>>>>> data store and thus any non owner will still have the old value they
>>>>> previously read before the commit occurred.
>>>>> 
>>>>> This seems quite inconsistent that Distributed caches run in a mix of
>>>>> Repeatable Read/Read Committed depending on what node and what key you
>>>>> are using.
>>>>> 
>>>>> To operate properly we could track requests similar to how it works
>>>>> for L1 so we can tell non owners to clear out their context values for
>>>>> values they read remotely that they haven't updated (since Read
>>>>> Committed writes should return the same written value).  That seems
>>>>> like quite a bit of additional overhead though.
>>>>> 
>>>>> I am wondering is it worth it to try to keep Read Committed isolation
>>>>> level though?  It seems that Repeatable Read would be simpler and most
>>>>> likely more performant as you wouldn't need all the additional remote
>>>>> calls to get it to work properly.  Or is it okay that we have
>>>>> different isolation levels for some keys on some nodes?  This could be
>>>>> quite confusing if a user was using a local and remote transaction and
>>>>> a transaction may not see the other's committed changes when they
>>>>> expect to.
>>>>> 
>>>>> What do you guys think?
>>>>> 
>>>>> - Will
>>>>> 
>>>>> P.S.
>>>>> 
>>>>> I also found a bug with Read Committed for all caches where if you do
>>>>> a write that changes the underlying InternalCacheEntry to a new type,
>>>>> that reads won't see subsequent committed values.  This is caused
>>>>> because the underlying data is changed to a new reference and a read
>>>>> would still be holding onto a reference of the old InternalCacheEntry.
>>>>> This can happen when using the various overridden put methods for
>>>>> example.  We should have a good solution for it, but may not be
>>>>> required if we find that Read Committed itself is flawed beyond
>>>>> saving.
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>>>> Cheers,
>>>> --
>>>> Mircea Markus
>>>> Infinispan lead (www.infinispan.org)
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev