[infinispan-dev] Read Committed Distributed Cache Concerns

Radim Vansa rvansa at redhat.com
Mon Sep 23 04:10:25 EDT 2013


On 09/22/2013 02:57 PM, Sanne Grinovero wrote:
> On 22 September 2013 13:22, Mircea Markus <mmarkus at redhat.com> wrote:
>>> On 21 Sep 2013, at 23:07, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>
>>>> On 19 September 2013 18:29, Mircea Markus <mmarkus at redhat.com> wrote:
>>>> (Adding Jonathan who knows a thing or two about transactions.)
>>>>
>>>> Given that READ_COMMITTED (RC) is less performant than REPEATABLE_READ (RR)
>>>> I don't see any value in keeping RC around. I don't think users rely on
>>>> exact RC semantics (i.e. if an entry has been committed then an ongoing
>>>> tx requires the most up 2 date value between reads) - that actually
>>>> is not the case with DIST caches as you've mentioned.
>>> I don't think you can generalize from the specific example William
>>> made;
>> William was reffering to the general case.
>>
>>> there will still be cases in which READ_COMMITTED will be more
>>> efficient than REPEATABLE_READ,
>> Looking at the implementation, (also as described by William) RC is't faster than RR in the general case. Curious why you think it would be though.
> William is describing a potential "fix" for the semantics which would
> make it slower than RR, we're arguing that this "fix" is not desired.
> Also, I'm not interested in counting method invocations needed to
> achieve this: I'm just thinking about the theoretical memory
> consumption of RR, which I'd consider more critical.

Do you expect running so many transactions in parallel that the memory 
used for keeping the old data would be crucial? Transactions should take 
few seconds at most, and the number of data you're able to retrieve 
within these few seconds should be much less than the overall capacity 
of the node.
Yes, these are just "should"s, I expect some usage patterns that may 
differ. But do I miss something?

Radim

>
>>> especially if you avoid "fixing" this, as suggested by Radim and
>>> myself in the two previous emails [not sure you if saw them, since you
>>> forking the conversation ignoring those mails]:
>>> if we agree that the current semantics is acceptable, it will
>>> consistently be faster than REPEATABLE_READ.
>> Radim's suggestion was to drop RC after running some tests to validate that RR provides the same performance. You +1 that so I don't understand why you say the conversation was forked.
> By "forking" I meant that you asked Jonathan's opinion but without
> including our response, so forking the conversation in two parallel
> discussions. I assume that was unintentional, but looked like you
> might not have seen our responses yet at time of writing yours.
> Also, I did "+1" a full paragraph of Radim's comments, not just his
> last sentence. Personally I find the initial part more important, so
> I'll quote it again:
>
> ~   On 19 September 2013 09:06, Radim Vansa <rvansa at redhat.com> wrote:
> ~   I think that Read Committed isolation level is not obliged to present
> ~   you with up-to-date committed data - the only fact is that it can, but
> ~   application must not rely on that. It's lower isolation level.
> ~   Nevertheless, I think that lower isolation level should mean better
> ~   performance. I would be strongly against imposing any additional
> ~   overhead that could slow it down [...]
>
> +1 for this. As mentioned above, if you're storing data blocks of
> non-irrelevant size, and my code is happy reading an older version
> even in the same transaction, I don't wish to incur in the performance
> penalty imposed by RR.
>
> --Sanne
>
>>> Sanne
>>>
>>>> I think RC is only preferred to RR because of performance, but if the performance
>>>> is the same (or even worse) I think we should only provide RR. Jonathan, care to comment?
>>>>
>>>>
>>>>> On Sep 18, 2013, at 11:03 PM, William Burns <mudokonman at gmail.com> wrote:
>>>>>
>>>>> I was recently refactoring code dealing with isolation levels and
>>>>> found how ReadCommitted is implemented and I have a few concerns I
>>>>> wanted to bring up.
>>>>>
>>>>> ReadCommitted read operations work by storing a reference to the value
>>>>> from the data store in its caller's context.  Thus whenever another
>>>>> transaction is committed that updates the data store value any context
>>>>> that has that reference now sees the latest committed value.  This
>>>>> works well for Local and Replicated caches since all data stores are
>>>>> updated with the latest value upon completion of the transaction.
>>>>> However Distributed caches only the owners see the update in their
>>>>> data store and thus any non owner will still have the old value they
>>>>> previously read before the commit occurred.
>>>>>
>>>>> This seems quite inconsistent that Distributed caches run in a mix of
>>>>> Repeatable Read/Read Committed depending on what node and what key you
>>>>> are using.
>>>>>
>>>>> To operate properly we could track requests similar to how it works
>>>>> for L1 so we can tell non owners to clear out their context values for
>>>>> values they read remotely that they haven't updated (since Read
>>>>> Committed writes should return the same written value).  That seems
>>>>> like quite a bit of additional overhead though.
>>>>>
>>>>> I am wondering is it worth it to try to keep Read Committed isolation
>>>>> level though?  It seems that Repeatable Read would be simpler and most
>>>>> likely more performant as you wouldn't need all the additional remote
>>>>> calls to get it to work properly.  Or is it okay that we have
>>>>> different isolation levels for some keys on some nodes?  This could be
>>>>> quite confusing if a user was using a local and remote transaction and
>>>>> a transaction may not see the other's committed changes when they
>>>>> expect to.
>>>>>
>>>>> What do you guys think?
>>>>>
>>>>> - Will
>>>>>
>>>>> P.S.
>>>>>
>>>>> I also found a bug with Read Committed for all caches where if you do
>>>>> a write that changes the underlying InternalCacheEntry to a new type,
>>>>> that reads won't see subsequent committed values.  This is caused
>>>>> because the underlying data is changed to a new reference and a read
>>>>> would still be holding onto a reference of the old InternalCacheEntry.
>>>>> This can happen when using the various overridden put methods for
>>>>> example.  We should have a good solution for it, but may not be
>>>>> required if we find that Read Committed itself is flawed beyond
>>>>> saving.
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> Cheers,
>>>> --
>>>> Mircea Markus
>>>> Infinispan lead (www.infinispan.org)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


-- 
Radim Vansa <rvansa at redhat.com>
JBoss DataGrid QA



More information about the infinispan-dev mailing list