Re: [infinispan-dev] Read Committed Distributed Cache Concerns

Sunday, 22 September 2013

...
 On 22 Sep 2013, at 13:57, Sanne Grinovero
<sanne(a)infinispan.org&gt; wrote:

> On 22 September 2013 13:22, Mircea Markus <mmarkus(a)redhat.com&gt; wrote:
> 
>>> On 21 Sep 2013, at 23:07, Sanne Grinovero <sanne(a)infinispan.org&gt;
wrote:
>>> 
>>> On 19 September 2013 18:29, Mircea Markus <mmarkus(a)redhat.com&gt; wrote:
>>> (Adding Jonathan who knows a thing or two about transactions.)
>>> 
>>> Given that READ_COMMITTED (RC) is less performant than REPEATABLE_READ (RR)
>>> I don't see any value in keeping RC around. I don't think users rely
on
>>> exact RC semantics (i.e. if an entry has been committed then an ongoing
>>> tx requires the most up 2 date value between reads) - that actually
>>> is not the case with DIST caches as you've mentioned.
>> 
>> I don't think you can generalize from the specific example William
>> made;
> 
> William was reffering to the general case.
> 
>> there will still be cases in which READ_COMMITTED will be more
>> efficient than REPEATABLE_READ,
> 
> Looking at the implementation, (also as described by William) RC is't faster than
RR in the general case. Curious why you think it would be though.

 William is describing a potential "fix" for the semantics which would
 make it slower than RR, we're arguing that this "fix" is not desired.

+ 1 for the fix not being desirable. The main question here though is: is there any point
in keeping around RC given that RR provides the same performance?

...
 Also, I'm not interested in counting method invocations needed
to
 achieve this: I'm just thinking about the theoretical memory
 consumption of RR 
Theory aside, the way RR is implemented, it shouldn't cosume more meomery than RC in
the general case (TBC by running benchmarks). 

...
 , which I'd consider more critical.

>> especially if you avoid "fixing" this, as suggested by Radim and
>> myself in the two previous emails [not sure you if saw them, since you
>> forking the conversation ignoring those mails]:
>> if we agree that the current semantics is acceptable, it will
>> consistently be faster than REPEATABLE_READ.
> 
> Radim's suggestion was to drop RC after running some tests to validate that RR
provides the same performance. You +1 that so I don't understand why you say the
conversation was forked.

 By "forking" I meant that you asked Jonathan's opinion but without
 including our response, so forking the conversation in two parallel
 discussions. I assume that was unintentional, but looked like you
 might not have seen our responses yet at time of writing yours.
 Also, I did "+1" a full paragraph of Radim's comments, not just his
 last sentence. Personally I find the initial part more important, so
 I'll quote it again:

 ~   On 19 September 2013 09:06, Radim Vansa <rvansa(a)redhat.com&gt; wrote:
 ~   I think that Read Committed isolation level is not obliged to present
 ~   you with up-to-date committed data - the only fact is that it can, but
 ~   application must not rely on that. It's lower isolation level.
 ~   Nevertheless, I think that lower isolation level should mean better
 ~   performance. I would be strongly against imposing any additional
 ~   overhead that could slow it down [...]

 +1 for this. As mentioned above, if you're storing data blocks of
 non-irrelevant size, and my code is happy reading an older version
 even in the same transaction, I don't wish to incur in the performance
 penalty imposed by RR.

 --Sanne

> 
>> 
>> Sanne
>> 
>>> I think RC is only preferred to RR because of performance, but if the
performance
>>> is the same (or even worse) I think we should only provide RR. Jonathan, care
to comment?
>>> 
>>> 
>>>> On Sep 18, 2013, at 11:03 PM, William Burns <mudokonman(a)gmail.com&gt;
wrote:
>>>> 
>>>> I was recently refactoring code dealing with isolation levels and
>>>> found how ReadCommitted is implemented and I have a few concerns I
>>>> wanted to bring up.
>>>> 
>>>> ReadCommitted read operations work by storing a reference to the value
>>>> from the data store in its caller's context.  Thus whenever another
>>>> transaction is committed that updates the data store value any context
>>>> that has that reference now sees the latest committed value.  This
>>>> works well for Local and Replicated caches since all data stores are
>>>> updated with the latest value upon completion of the transaction.
>>>> However Distributed caches only the owners see the update in their
>>>> data store and thus any non owner will still have the old value they
>>>> previously read before the commit occurred.
>>>> 
>>>> This seems quite inconsistent that Distributed caches run in a mix of
>>>> Repeatable Read/Read Committed depending on what node and what key you
>>>> are using.
>>>> 
>>>> To operate properly we could track requests similar to how it works
>>>> for L1 so we can tell non owners to clear out their context values for
>>>> values they read remotely that they haven't updated (since Read
>>>> Committed writes should return the same written value).  That seems
>>>> like quite a bit of additional overhead though.
>>>> 
>>>> I am wondering is it worth it to try to keep Read Committed isolation
>>>> level though?  It seems that Repeatable Read would be simpler and most
>>>> likely more performant as you wouldn't need all the additional
remote
>>>> calls to get it to work properly.  Or is it okay that we have
>>>> different isolation levels for some keys on some nodes?  This could be
>>>> quite confusing if a user was using a local and remote transaction and
>>>> a transaction may not see the other's committed changes when they
>>>> expect to.
>>>> 
>>>> What do you guys think?
>>>> 
>>>> - Will
>>>> 
>>>> P.S.
>>>> 
>>>> I also found a bug with Read Committed for all caches where if you do
>>>> a write that changes the underlying InternalCacheEntry to a new type,
>>>> that reads won't see subsequent committed values.  This is caused
>>>> because the underlying data is changed to a new reference and a read
>>>> would still be holding onto a reference of the old InternalCacheEntry.
>>>> This can happen when using the various overridden put methods for
>>>> example.  We should have a good solution for it, but may not be
>>>> required if we find that Read Committed itself is flawed beyond
>>>> saving.
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> Cheers,
>>> --
>>> Mircea Markus
>>> Infinispan lead (www.infinispan.org)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev(a)lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev(a)lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Read Committed Distributed Cache Concerns