[infinispan-dev] Transactional consistency of query

Gustavo Fernandes gustavo at infinispan.org
Mon Jul 31 04:41:41 EDT 2017


IMO, indexing should be eventually consistent, as this offers the best
performance.

On tx-caches, although Lucene has hooks to be enlisted in a transaction
[1], some backends (elasticsearch) don't
expose this, and Hibernate Search by design doesn't make use of it. So
currently we must deal with inconsistencies
after the fact: checking for nulls, mismatched types and so on.

[1]
https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TwoPhaseCommit.html


On Fri, Jul 28, 2017 at 1:59 PM, Adrian Nistor <anistor at redhat.com> wrote:

> My feeling regarding this was to accept such inconsistencies, but maybe
> I'm wrong. I've always regarded indexing as being async in general, even
> though it did behave as if being sync in some not so rare circumstances,
> which probably made people believe it is expected to be sync in general.
> I'm curious what Sanne and Gustavo have in mind.
>
> Please note that updating the index synchronously during tx commit was
> always regarded as a performance bottleneck, so it was out of the
> question.
>
And that would not always work anyway, it all depends on the
> underlying indexing technology. For example when using HS with elastic
> search you have to accept that elastic indexing is always async.
>
> And there might not be an index at all. It's very possible that the
> query runs unindexed. In that case it will use distributed streams which
> have their own transaction issues.
>
> In the past we had some bugs were a matching entry was deleted/evicted
> right before the search results were returned to the user, so loading of
> those values failed in a silent way. Those queries mistakenly returned
> some unexpected nulls among other valid results. The fix was to just
> filter out those nulls. We could enhance that to double check that the
> returned entry is indeed of the requested type, to also cover the issue
> that you encountered.
>
> Adrian
>
> On 07/28/2017 01:38 PM, Radim Vansa wrote:
> > Hi,
> >
> > while working on ISPN-7806 I am wondering how should queries work with
> > transactions. Right now it seems that updates to index are done during
> > either regular command execution (on originator [A]) or prepare command
> > on remote nodes [B]. Both of these cause rolled-back transactions to be
> > seen, so these must be treated as bugs [C].
> >
> > If we index the data after committing the transaction, there would be a
> > time window when we could see the updated entries but the index would
> > not reflect that. That might be acceptable limitation if a
> > query-matching misses some entity, but it's also possible that we
> > retrieve the query result key-set and then (after retrieving full
> > entities) we return something that does not match the query. One of the
> > reproducers for ISPN-7806 I've written [1] triggers a situation where
> > listing all Persons could return Animal (different entity type), so I
> > think that there's no validity post-check (though these reproducers
> > don't use transactions).
> >
> > Therefore, I wonder if the index should contain only the key; maybe we
> > should store an unique version and invalidate the query if some of the
> > entries has changed.
> >
> > If we index the data before committing the transaction, similar
> > situation could happen: the index will return keys for entities that
> > will match in the future but the actually returned list will contain
> > stale entities.
> >
> > What's the overall plan? Do we just accept inconsistencies? In that
> > case, please add a verbose statement in docs and point me to that.
> >
> > And if I've misinterpreted something and raised the red flag in error,
> > please let me know.
> >
> > Radim
> >
> > [A] This seems to be a regression after moving towards async
> > interceptors - our impl of
> > org.hibernate.search.backend.TransactionContext is incorrectly bound to
> > TransactionManager. Then we seem to be running out of transaction and
> > are happy to index it right away. The thread that executes the
> > interceptor handler is also dependent on ownership (due to remote
> > LockCommand execution), so I think that it does not fail the local-mode
> > tests.
> >
> > [B] ... and it does so twice as a regression after ISPN-7840 but that's
> > easy to fix.
> >
> > [C] Indexing in prepare command was OK before ISPN-7840 with pessimistic
> > locking which does not send the CommitCommand, but now that the QI has
> > been moved below EWI it means that we're indexing before storing the
> > actual values. Optimistic locking was not correct, though.
> >
> > [1]
> > https://github.com/rvansa/infinispan/commit/
> 1d62c9b84888c7ac21a9811213b5657aa44ff546
> >
> >
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20170731/0961b9f9/attachment.html 


More information about the infinispan-dev mailing list