<div dir="ltr"><div>IMO, indexing should be eventually consistent, as this offers the best performance.<br><br></div><div>On tx-caches, although Lucene has hooks to be enlisted in a transaction [1], some backends (elasticsearch) don't<br>expose this, and Hibernate Search by design doesn't make use of it. So currently we must deal with inconsistencies <br></div><div>after the fact: checking for nulls, mismatched types and so on.<br></div><div><br>[1] <a href="https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TwoPhaseCommit.html">https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TwoPhaseCommit.html</a><br><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 28, 2017 at 1:59 PM, Adrian Nistor <span dir="ltr"><<a href="mailto:anistor@redhat.com" target="_blank">anistor@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">My feeling regarding this was to accept such inconsistencies, but maybe<br>
I'm wrong. I've always regarded indexing as being async in general, even<br>
though it did behave as if being sync in some not so rare circumstances,<br>
which probably made people believe it is expected to be sync in general.<br>
I'm curious what Sanne and Gustavo have in mind.<br>
<br>
Please note that updating the index synchronously during tx commit was<br>
always regarded as a performance bottleneck, so it was out of the<br>
question. <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">And that would not always work anyway, it all depends on the<br>
underlying indexing technology. For example when using HS with elastic<br>
search you have to accept that elastic indexing is always async.<br>
<br>
And there might not be an index at all. It's very possible that the<br>
query runs unindexed. In that case it will use distributed streams which<br>
have their own transaction issues.<br>
<br>
In the past we had some bugs were a matching entry was deleted/evicted<br>
right before the search results were returned to the user, so loading of<br>
those values failed in a silent way. Those queries mistakenly returned<br>
some unexpected nulls among other valid results. The fix was to just<br>
filter out those nulls. We could enhance that to double check that the<br>
returned entry is indeed of the requested type, to also cover the issue<br>
that you encountered.<br>
<span class="gmail-HOEnZb"><font color="#888888"><br>
Adrian<br>
</font></span><div class="gmail-HOEnZb"><div class="gmail-h5"><br>
On 07/28/2017 01:38 PM, Radim Vansa wrote:<br>
> Hi,<br>
><br>
> while working on ISPN-7806 I am wondering how should queries work with<br>
> transactions. Right now it seems that updates to index are done during<br>
> either regular command execution (on originator [A]) or prepare command<br>
> on remote nodes [B]. Both of these cause rolled-back transactions to be<br>
> seen, so these must be treated as bugs [C].<br>
><br>
> If we index the data after committing the transaction, there would be a<br>
> time window when we could see the updated entries but the index would<br>
> not reflect that. That might be acceptable limitation if a<br>
> query-matching misses some entity, but it's also possible that we<br>
> retrieve the query result key-set and then (after retrieving full<br>
> entities) we return something that does not match the query. One of the<br>
> reproducers for ISPN-7806 I've written [1] triggers a situation where<br>
> listing all Persons could return Animal (different entity type), so I<br>
> think that there's no validity post-check (though these reproducers<br>
> don't use transactions).<br>
><br>
> Therefore, I wonder if the index should contain only the key; maybe we<br>
> should store an unique version and invalidate the query if some of the<br>
> entries has changed.<br>
><br>
> If we index the data before committing the transaction, similar<br>
> situation could happen: the index will return keys for entities that<br>
> will match in the future but the actually returned list will contain<br>
> stale entities.<br>
><br>
> What's the overall plan? Do we just accept inconsistencies? In that<br>
> case, please add a verbose statement in docs and point me to that.<br>
><br>
> And if I've misinterpreted something and raised the red flag in error,<br>
> please let me know.<br>
><br>
> Radim<br>
><br>
> [A] This seems to be a regression after moving towards async<br>
> interceptors - our impl of<br>
> org.hibernate.search.backend.<wbr>TransactionContext is incorrectly bound to<br>
> TransactionManager. Then we seem to be running out of transaction and<br>
> are happy to index it right away. The thread that executes the<br>
> interceptor handler is also dependent on ownership (due to remote<br>
> LockCommand execution), so I think that it does not fail the local-mode<br>
> tests.<br>
><br>
> [B] ... and it does so twice as a regression after ISPN-7840 but that's<br>
> easy to fix.<br>
><br>
> [C] Indexing in prepare command was OK before ISPN-7840 with pessimistic<br>
> locking which does not send the CommitCommand, but now that the QI has<br>
> been moved below EWI it means that we're indexing before storing the<br>
> actual values. Optimistic locking was not correct, though.<br>
><br>
> [1]<br>
> <a href="https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546" rel="noreferrer" target="_blank">https://github.com/rvansa/<wbr>infinispan/commit/<wbr>1d62c9b84888c7ac21a9811213b565<wbr>7aa44ff546</a><br>
><br>
><br>
<br>
</div></div><div class="gmail-HOEnZb"><div class="gmail-h5">______________________________<wbr>_________________<br>
infinispan-dev mailing list<br>
<a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" rel="noreferrer" target="_blank">https://lists.jboss.org/<wbr>mailman/listinfo/infinispan-<wbr>dev</a><br>
</div></div></blockquote></div><br></div></div></div>