[infinispan-dev] Major version cleaning

Dan Berindei dan.berindei at gmail.com
Tue Feb 21 09:52:22 EST 2017


On Tue, Feb 21, 2017 at 10:39 AM, Sanne Grinovero <sanne at infinispan.org> wrote:
> On 21 February 2017 at 07:37, Dan Berindei <dan.berindei at gmail.com> wrote:
>> On Mon, Feb 20, 2017 at 8:02 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>> -1 to batch removal
>>>
>>> Frankly I've always been extremely negative about the fact that
>>> batches are built on top of transactions. It's easy to find several
>>> iterations of rants of mine on this mailing list, especially fierce
>>> debates with Mircea. So I'd welcome a separation of these features.
>>>
>>> Yet, removing batching seems very wrong. I disagree that people should
>>> use Transactions instead; for many use cases it's overkill, and for
>>> others - and this is the main pain point I always had with the current
>>> design - it might make sense to have a batch (or more than one) within
>>> a transaction.
>>> I have had practical problems with needing to "flush" a single batch
>>> within a transaction as the size of the combined elements was getting
>>> too large. That doesn't imply at all that I'm ready to commit.
>>>
>>
>> WDYM by "flush" here? I have a feeling this is nothing like our
>> batching ever was...
>
> I'm just listing points about why incorporating the batch concept with
> transactions is not practical.
>
> I mean that when one has to write very large amounts of data, you need to
> break it up in smaller batches; *essentially* to flush the first batch before
> starting the second one.
> So that is unrelated with atomicity and consistency, as in practice one has
> to split one business operation into multiple batches.
>

Kind of repeating Radim's question, but how is this better than having
multiple small transactions and committing each one separately?

>>
>>> @Pedro: the fact that some code is broken today is not relevant, when
>>> there's need for such features. Like you suggest, it had bad premises
>>> (build it on TX) so we should address that, but not throw it all out.
>>>
>>
>> Infinispan never created nested batches:
>
> I know. I'm not describing the current implementation, I'm describing use
> cases which are not addressed, or in which Infinispan is clumsy to use,
> to suggest improvements.
> And I'm not asking to have nested batches. Again, just pointing
> out practical problems with the current batch design.
>
> It should be possible to run an efficient batch of operations within a
> transaction.
> Or a sequence of batches, all within the same transaction.
> N.B. you could see that as a "nested batch" in the current twisted idea that
> a batch is a transaction - which is what I'm arguing against.
>

I'm sure there there are use cases that we don't address properly, but
I don't know enough about those use cases or your proposed batching
API improvements to really say anything about them. The only thing I'm
confident about is that the current batching API is almost certainly
not the answer.

>
>> calling startBatch() when a
>> batch was already associated with the current thread just incremented
>> a refcount, and only the final endBatch() did any work. OTOH running a
>> batch within a transaction always worked very much like suspending the
>> current transaction, starting a new one, and committing it on
>> endBatch(). So the only real difference between batching and using
>> DummyTransactionManager is that batching is limited to one cache's
>> operations, while DummyTransactionManager supports multiple resources.
>>
>>
>>> @Bela is making spot-on objections based on use cases, which need to
>>> be addressed in some way. The "atomical" operations still don't work
>>> right[1] in Infinispan and that's a big usability problem.
>>>
>>
>> Batching never was about sending updates asynchronously. We have
>> putAllAsync() for that, which doesn't need transactions, and it's even
>> slightly more efficient without transactions.
>
> If you think this way, it sounds like you give no value to the application
> performance: people need more than a couple put operations.
>

Unfortunately, going beyond simple put operations, e.g. conditional
writes, is exactly where asynchronous replication fails. Even doing
simple put operations and making sure that those values are written to
the cache is an impossible task with asynchronous replication.
Considering that the batch methods are called startAtomic() and
endAtomic() in TreeCache, I really don't think they were created with
asynchronous replication in mind.

Off-topic: The BatchingCache#startBatch() javadoc was never true for
distributed caches: writes are queued, but reads (or puts without
IGNORE_RETURN_VALUES) always result in a remote call. Locks cause
remote calls in a cache with pessimistic locking, too, although one
could argue that back in version 4.0, locks were indeed acquired
locally during the write and remotely at the end of the batch.

>>
>> And atomical operations have bugs, yes, but I'm not sure how
>> implementing a new kind of batching that isn't based on transactions
>> would help with that.
>>
>>
>>> +1 to remove async TX
>>>
>>> I actually like the concept but the API should be different.. might as
>>> well remove this for now.
>>>
>>> +1 to remove the Tree module
>>>
>>> I personally never used it as you all advised against, yet it's been
>>> often requested by users; sometimes explicitly and others not so
>>> explicit, yet people often have need which would be solved by a good
>>> Tree module.
>>> I understand the reasons you all want to remove it: it's buggy. But I
>>> believe the real reason is that it should have been built on top of a
>>> proper batch API, and using real MVCC. In that case it wouldn't have
>>> been buggy, nor too hard to maintain, and might have attracted way
>>> more interest as well.
>>
>> I think the fact that we haven't been able to build a "proper" batch
>> API using real MVCC yet is a proof to the contrary...
>>
>>
>>> I'd remove it as a temporary measure: delete the bad stuff, but
>>> hopefully it could be reintroduced built on better principles in some
>>> future?
>>>
>>> Thanks,
>>> Sanne
>>>
>>> [1] - "right" as in user expectations and actual practical use, which
>>> is currently different than in the twisted definition of "right" that
>>> the team has been using as an excuse ;-) I'll also proof that it
>>> doesn't work right according to your own twisted specs, when I find
>>> the time to finish some tests..
>>>
>>>
>>> On 20 February 2017 at 16:48, Pedro Ruivo <pedro at infinispan.org> wrote:
>>>>
>>>>
>>>> On 20-02-2017 16:12, Bela Ban wrote:
>>>>>
>>>>>
>>>>> On 20/02/17 17:06, Tristan Tarrant wrote:
>>>>>> Hi guys, we discussed about this a little bit in the past and this
>>>>>> morning on IRC. Here are some proposed removals:
>>>>>>
>>>>>> - Remove the async transactional modes, as they are quite pointless
>>>>>> - Remove batching: users should use transactions
>>>>>
>>>>> How do you make a bunch of modifications and send them asynchronously if
>>>>> both batching *and* async TXs are getting removed?
>>>>
>>>> We are not removing features, we are removing broken code.
>>>>
>>>> Batching is using transactions and async transactions doesn't make sense
>>>> since infinispan has to report to TransactionManager.
>>>>
>>>> Our current asyn-tx is broken in a way that is starts to commit and
>>>> reports OK to the transaction manager. if you have a topology change or
>>>> a conflict, you will end with inconsistent data.
>>>>
>>>> So, why do we keeping this code around?
>>>>
>>>> you can still move a transaction commit to another thread if you don't
>>>> wanna wait for its commit:
>>>>
>>>> tm.begin()
>>>> doWork()
>>>> tx = tm.suspend()
>>>> new_thread {
>>>>    tm.resume(tx)
>>>>    tm.commit()
>>>> }
>>>>
>>>> The best thing I can think of is to keep the batching API and
>>>> re-implement it to provide an endBatchAsync() that will do the above.
>>>>
>>>>>
>>>>> So if someone wants to apply a unit of work *atomically* (either all
>>>>> modifications within that unit of work are applied, or none), what
>>>>> alternatives exist?
>>>>>
>>>>>> - Remove the tree module: it doesn't work properly, and uses batching
>>>>>>
>>>>>> Please cast your votes
>>>>>>
>>>>>> Tristan
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev


More information about the infinispan-dev mailing list