[infinispan-dev] Major version cleaning

Tue Feb 21 10:36:02 EST 2017

On 21 February 2017 at 14:52, Dan Berindei <dan.berindei at gmail.com> wrote:
> On Tue, Feb 21, 2017 at 10:39 AM, Sanne Grinovero <sanne at infinispan.org> wrote:
>> On 21 February 2017 at 07:37, Dan Berindei <dan.berindei at gmail.com> wrote:
>>> On Mon, Feb 20, 2017 at 8:02 PM, Sanne Grinovero <sanne at infinispan.org> wrote:
>>>> -1 to batch removal
>>>>
>>>> Frankly I've always been extremely negative about the fact that
>>>> batches are built on top of transactions. It's easy to find several
>>>> iterations of rants of mine on this mailing list, especially fierce
>>>> debates with Mircea. So I'd welcome a separation of these features.
>>>>
>>>> Yet, removing batching seems very wrong. I disagree that people should
>>>> use Transactions instead; for many use cases it's overkill, and for
>>>> others - and this is the main pain point I always had with the current
>>>> design - it might make sense to have a batch (or more than one) within
>>>> a transaction.
>>>> I have had practical problems with needing to "flush" a single batch
>>>> within a transaction as the size of the combined elements was getting
>>>> too large. That doesn't imply at all that I'm ready to commit.
>>>>
>>>
>>> WDYM by "flush" here? I have a feeling this is nothing like our
>>> batching ever was...
>>
>> I'm just listing points about why incorporating the batch concept with
>> transactions is not practical.
>>
>> I mean that when one has to write very large amounts of data, you need to
>> break it up in smaller batches; *essentially* to flush the first batch before
>> starting the second one.
>> So that is unrelated with atomicity and consistency, as in practice one has
>> to split one business operation into multiple batches.
>>
>
> Kind of repeating Radim's question, but how is this better than having
> multiple small transactions and committing each one separately?

If you assume that code integrating with Infinispan is in control of
the transaction boundaries - for example able to decide when it's time
to commit - it implies you can not integrate with other transactional
components.

Which defeats the purpose of exposing JTA integration.

>
>>>
>>>> @Pedro: the fact that some code is broken today is not relevant, when
>>>> there's need for such features. Like you suggest, it had bad premises
>>>> (build it on TX) so we should address that, but not throw it all out.
>>>>
>>>
>>> Infinispan never created nested batches:
>>
>> I know. I'm not describing the current implementation, I'm describing use
>> cases which are not addressed, or in which Infinispan is clumsy to use,
>> to suggest improvements.
>> And I'm not asking to have nested batches. Again, just pointing
>> out practical problems with the current batch design.
>>
>> It should be possible to run an efficient batch of operations within a
>> transaction.
>> Or a sequence of batches, all within the same transaction.
>> N.B. you could see that as a "nested batch" in the current twisted idea that
>> a batch is a transaction - which is what I'm arguing against.
>>
>
> I'm sure there there are use cases that we don't address properly, but
> I don't know enough about those use cases or your proposed batching
> API improvements to really say anything about them. The only thing I'm
> confident about is that the current batching API is almost certainly
> not the answer.
>
>>
>>> calling startBatch() when a
>>> batch was already associated with the current thread just incremented
>>> a refcount, and only the final endBatch() did any work. OTOH running a
>>> batch within a transaction always worked very much like suspending the
>>> current transaction, starting a new one, and committing it on
>>> endBatch(). So the only real difference between batching and using
>>> DummyTransactionManager is that batching is limited to one cache's
>>> operations, while DummyTransactionManager supports multiple resources.
>>>
>>>
>>>> @Bela is making spot-on objections based on use cases, which need to
>>>> be addressed in some way. The "atomical" operations still don't work
>>>> right[1] in Infinispan and that's a big usability problem.
>>>>
>>>
>>> Batching never was about sending updates asynchronously. We have
>>> putAllAsync() for that, which doesn't need transactions, and it's even
>>> slightly more efficient without transactions.
>>
>> If you think this way, it sounds like you give no value to the application
>> performance: people need more than a couple put operations.
>>
>
> Unfortunately, going beyond simple put operations, e.g. conditional
> writes, is exactly where asynchronous replication fails. Even doing
> simple put operations and making sure that those values are written to
> the cache is an impossible task with asynchronous replication.
> Considering that the batch methods are called startAtomic() and
> endAtomic() in TreeCache, I really don't think they were created with
> asynchronous replication in mind.
>
> Off-topic: The BatchingCache#startBatch() javadoc was never true for
> distributed caches: writes are queued, but reads (or puts without
> IGNORE_RETURN_VALUES) always result in a remote call. Locks cause
> remote calls in a cache with pessimistic locking, too, although one
> could argue that back in version 4.0, locks were indeed acquired
> locally during the write and remotely at the end of the batch.
>
>>>
>>> And atomical operations have bugs, yes, but I'm not sure how
>>> implementing a new kind of batching that isn't based on transactions
>>> would help with that.
>>>
>>>
>>>> +1 to remove async TX
>>>>
>>>> I actually like the concept but the API should be different.. might as
>>>> well remove this for now.
>>>>
>>>> +1 to remove the Tree module
>>>>
>>>> I personally never used it as you all advised against, yet it's been
>>>> often requested by users; sometimes explicitly and others not so
>>>> explicit, yet people often have need which would be solved by a good
>>>> Tree module.
>>>> I understand the reasons you all want to remove it: it's buggy. But I
>>>> believe the real reason is that it should have been built on top of a
>>>> proper batch API, and using real MVCC. In that case it wouldn't have
>>>> been buggy, nor too hard to maintain, and might have attracted way
>>>> more interest as well.
>>>
>>> I think the fact that we haven't been able to build a "proper" batch
>>> API using real MVCC yet is a proof to the contrary...
>>>
>>>
>>>> I'd remove it as a temporary measure: delete the bad stuff, but
>>>> hopefully it could be reintroduced built on better principles in some
>>>> future?
>>>>
>>>> Thanks,
>>>> Sanne
>>>>
>>>> [1] - "right" as in user expectations and actual practical use, which
>>>> is currently different than in the twisted definition of "right" that
>>>> the team has been using as an excuse ;-) I'll also proof that it
>>>> doesn't work right according to your own twisted specs, when I find
>>>> the time to finish some tests..
>>>>
>>>>
>>>> On 20 February 2017 at 16:48, Pedro Ruivo <pedro at infinispan.org> wrote:
>>>>>
>>>>>
>>>>> On 20-02-2017 16:12, Bela Ban wrote:
>>>>>>
>>>>>>
>>>>>> On 20/02/17 17:06, Tristan Tarrant wrote:
>>>>>>> Hi guys, we discussed about this a little bit in the past and this
>>>>>>> morning on IRC. Here are some proposed removals:
>>>>>>>
>>>>>>> - Remove the async transactional modes, as they are quite pointless
>>>>>>> - Remove batching: users should use transactions
>>>>>>
>>>>>> How do you make a bunch of modifications and send them asynchronously if
>>>>>> both batching *and* async TXs are getting removed?
>>>>>
>>>>> We are not removing features, we are removing broken code.
>>>>>
>>>>> Batching is using transactions and async transactions doesn't make sense
>>>>> since infinispan has to report to TransactionManager.
>>>>>
>>>>> Our current asyn-tx is broken in a way that is starts to commit and
>>>>> reports OK to the transaction manager. if you have a topology change or
>>>>> a conflict, you will end with inconsistent data.
>>>>>
>>>>> So, why do we keeping this code around?
>>>>>
>>>>> you can still move a transaction commit to another thread if you don't
>>>>> wanna wait for its commit:
>>>>>
>>>>> tm.begin()
>>>>> doWork()
>>>>> tx = tm.suspend()
>>>>> new_thread {
>>>>>    tm.resume(tx)
>>>>>    tm.commit()
>>>>> }
>>>>>
>>>>> The best thing I can think of is to keep the batching API and
>>>>> re-implement it to provide an endBatchAsync() that will do the above.
>>>>>
>>>>>>
>>>>>> So if someone wants to apply a unit of work *atomically* (either all
>>>>>> modifications within that unit of work are applied, or none), what
>>>>>> alternatives exist?
>>>>>>
>>>>>>> - Remove the tree module: it doesn't work properly, and uses batching
>>>>>>>
>>>>>>> Please cast your votes
>>>>>>>
>>>>>>> Tristan
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev