Re: [infinispan-dev] Major version cleaning

Tuesday, 21 February 2017

On 02/21/2017 05:16 PM, Tristan Tarrant wrote:
...
 On 21/02/17 16:29, Sanne Grinovero wrote:
>> You haven't explained what "flush" means. Since you separate that
from
>> atomicity/consistency, I assume that batches on non-tx cache are just
>> ordered putOrRemoveAll operations, immediately visible on flush without
>> any atomicity.
 I assume that in Sanne's idea, ordering in a batch doesn't matter, aside
 from operations on the same key. Having ordering in there would for
 example not allow us to parallelize by segment.

> So I want to write a first chunk, in our code that looks like:
>
> startBatch
> put(chunk1/A, [some large value])
> put(chunk1/B, [some small metadata])
> put(chunk1/C, [some small metadata])
> endBatch
> There is no reason to use a transaction, in fact we had to disable
> transactions as some of these entries could be large.
> There also is no reason for the batch, other than optimising the latency.
 Let me summarize to see if we have the requirements for a useful
 batching system (which is sort of patterned on the JDBC statement batching):

 - a batch is not an atomic operation, i.e. it is not backed by a transaction
 - it can be wrapped in a transaction if needed
 - batches cannot be nested
 - batches only involve unconditional write operations (put, putAll, remove)
 - ordering of operations within a batch is unimportant aside from
 modifications to the same key where we apply "last one wins"
 - when a batch is "flushed" (i.e. endBatch is invoked) the ops are
 grouped by segment and sent to the appropriate owner for processing,
 potentially in parallel

 As Radim has called it, this is essentially a putOrRemoveAll op (with an
 async counterpart). 
It is putOrRemoveAll when applied on a non-tx cache, and actually 
implementing that shouldn't be complex. However, when transactions come 
into play, it is different, because Sanne wants us to remove the 
modifications in completed batch from the local transactional invocation 
context and 'cache' them on the owners. Since reads have to be 
transactionally consistent, we need to inspect the transaction on the 
remote nodes (remote repeatable read).

Sanne's request makes sense to me. However as the current implementation 
is providing a false assumption that it could work as JDBC batches while 
it's nothing but crippled JTA, and as I don't see anyone shouting "I'll 
implement that, next week it's done!", I second deprecating/removing the 
API for the time being.

I don't find the current API ideal either, as it depends on thread 
locals (JTA does as well, but...) while it does not seem useful enough 
to me. I would prefer

interface BatchingCache {
     Batch start();
}

@NotThreadSafe
interface Batch {
     void put(K k, V value);
     ...
     void execute();
     void drop(); // maybe not needed
}

Radim

...

 Is that summary correct ?

 Tristan 

-- 
Radim Vansa <rvansa(a)redhat.com&gt;
JBoss Performance Team

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Major version cleaning