[infinispan-dev] Transactional consistency of query
Radim Vansa
rvansa at redhat.com
Mon Jul 31 06:27:56 EDT 2017
On 07/31/2017 11:12 AM, Tristan Tarrant wrote:
> Shouldn't we use an appropriate conflict resolution strategy for this so
> that in case of partitions we repair the index ?
This is not about eventual consistency in case of partitions, just
eventually publishing the change in the index after the transaction
completes.
Making index consistent after a split brain (even with DENY_ALL policy
some operations may end up in a half-complete state) is a completely
different issue and I think nobody ever tried to deal with that.
R.
>
> Tristan
>
> On 7/31/17 10:41 AM, Gustavo Fernandes wrote:
>> IMO, indexing should be eventually consistent, as this offers the best
>> performance.
>>
>> On tx-caches, although Lucene has hooks to be enlisted in a transaction
>> [1], some backends (elasticsearch) don't
>> expose this, and Hibernate Search by design doesn't make use of it. So
>> currently we must deal with inconsistencies
>> after the fact: checking for nulls, mismatched types and so on.
>>
>> [1]
>> https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TwoPhaseCommit.html
>>
>>
>> On Fri, Jul 28, 2017 at 1:59 PM, Adrian Nistor <anistor at redhat.com
>> <mailto:anistor at redhat.com>> wrote:
>>
>> My feeling regarding this was to accept such inconsistencies, but maybe
>> I'm wrong. I've always regarded indexing as being async in general, even
>> though it did behave as if being sync in some not so rare circumstances,
>> which probably made people believe it is expected to be sync in general.
>> I'm curious what Sanne and Gustavo have in mind.
>>
>> Please note that updating the index synchronously during tx commit was
>> always regarded as a performance bottleneck, so it was out of the
>> question.
>>
>> And that would not always work anyway, it all depends on the
>> underlying indexing technology. For example when using HS with elastic
>> search you have to accept that elastic indexing is always async.
>>
>> And there might not be an index at all. It's very possible that the
>> query runs unindexed. In that case it will use distributed streams which
>> have their own transaction issues.
>>
>> In the past we had some bugs were a matching entry was deleted/evicted
>> right before the search results were returned to the user, so loading of
>> those values failed in a silent way. Those queries mistakenly returned
>> some unexpected nulls among other valid results. The fix was to just
>> filter out those nulls. We could enhance that to double check that the
>> returned entry is indeed of the requested type, to also cover the issue
>> that you encountered.
>>
>> Adrian
>>
>> On 07/28/2017 01:38 PM, Radim Vansa wrote:
>> > Hi,
>> >
>> > while working on ISPN-7806 I am wondering how should queries work
>> with
>> > transactions. Right now it seems that updates to index are done
>> during
>> > either regular command execution (on originator [A]) or prepare
>> command
>> > on remote nodes [B]. Both of these cause rolled-back transactions
>> to be
>> > seen, so these must be treated as bugs [C].
>> >
>> > If we index the data after committing the transaction, there
>> would be a
>> > time window when we could see the updated entries but the index would
>> > not reflect that. That might be acceptable limitation if a
>> > query-matching misses some entity, but it's also possible that we
>> > retrieve the query result key-set and then (after retrieving full
>> > entities) we return something that does not match the query. One
>> of the
>> > reproducers for ISPN-7806 I've written [1] triggers a situation where
>> > listing all Persons could return Animal (different entity type), so I
>> > think that there's no validity post-check (though these reproducers
>> > don't use transactions).
>> >
>> > Therefore, I wonder if the index should contain only the key;
>> maybe we
>> > should store an unique version and invalidate the query if some
>> of the
>> > entries has changed.
>> >
>> > If we index the data before committing the transaction, similar
>> > situation could happen: the index will return keys for entities that
>> > will match in the future but the actually returned list will contain
>> > stale entities.
>> >
>> > What's the overall plan? Do we just accept inconsistencies? In that
>> > case, please add a verbose statement in docs and point me to that.
>> >
>> > And if I've misinterpreted something and raised the red flag in
>> error,
>> > please let me know.
>> >
>> > Radim
>> >
>> > [A] This seems to be a regression after moving towards async
>> > interceptors - our impl of
>> > org.hibernate.search.backend.TransactionContext is incorrectly
>> bound to
>> > TransactionManager. Then we seem to be running out of transaction and
>> > are happy to index it right away. The thread that executes the
>> > interceptor handler is also dependent on ownership (due to remote
>> > LockCommand execution), so I think that it does not fail the
>> local-mode
>> > tests.
>> >
>> > [B] ... and it does so twice as a regression after ISPN-7840 but
>> that's
>> > easy to fix.
>> >
>> > [C] Indexing in prepare command was OK before ISPN-7840 with
>> pessimistic
>> > locking which does not send the CommitCommand, but now that the
>> QI has
>> > been moved below EWI it means that we're indexing before storing the
>> > actual values. Optimistic locking was not correct, though.
>> >
>> > [1]
>> >
>> https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546
>> <https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546>
>> >
>> >
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org <mailto:infinispan-dev at lists.jboss.org>
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> <https://lists.jboss.org/mailman/listinfo/infinispan-dev>
>>
>>
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
--
Radim Vansa <rvansa at redhat.com>
JBoss Performance Team
More information about the infinispan-dev
mailing list