[infinispan-dev] Transactional consistency of query

Radim Vansa rvansa at redhat.com
Fri Jul 28 09:42:41 EDT 2017


On 07/28/2017 02:59 PM, Adrian Nistor wrote:
> My feeling regarding this was to accept such inconsistencies, but 
> maybe I'm wrong. I've always regarded indexing as being async in 
> general, even though it did behave as if being sync in some not so 
> rare circumstances, which probably made people believe it is expected 
> to be sync in general. I'm curious what Sanne and Gustavo have in mind.
>
> Please note that updating the index synchronously during tx commit was 
> always regarded as a performance bottleneck, so it was out of the 
> question. And that would not always work anyway, it all depends on the 
> underlying indexing technology. For example when using HS with elastic 
> search you have to accept that elastic indexing is always async.

OK, queries being inherently async would be acceptable for me (as long 
as we document it - preferably blogging about the limitations, too). 
Could you make sure that But async should mean that the result looks as 
being done at some point earlier, maybe mix ordering a bit, but not that 
it's inconsistent (e.g. returning entries that not match the criteria). 
Also in case that we store fields in index and return a projection, 
those values should not come expose any non-committed data.

I guess that expecting query in transaction to reflect uncommitted state 
would be probably too much :)

>
> And there might not be an index at all. It's very possible that the 
> query runs unindexed. In that case it will use distributed streams 
> which have their own transaction issues.

Yes; please leave non-indexed queries aside from this discussion.

>
> In the past we had some bugs were a matching entry was deleted/evicted 
> right before the search results were returned to the user, so loading 
> of those values failed in a silent way. Those queries mistakenly 
> returned some unexpected nulls among other valid results. The fix was 
> to just filter out those nulls. We could enhance that to double check 
> that the returned entry is indeed of the requested type, to also cover 
> the issue that you encountered.

It's not just entity type, criteria may be invalidated by any field 
change. Would a full criteria check on the returned entities be too 
expensive? Can you even check e.g. native queries against provided set 
of objects?

Radim

>
> Adrian
>
> On 07/28/2017 01:38 PM, Radim Vansa wrote:
>> Hi,
>>
>> while working on ISPN-7806 I am wondering how should queries work with
>> transactions. Right now it seems that updates to index are done during
>> either regular command execution (on originator [A]) or prepare command
>> on remote nodes [B]. Both of these cause rolled-back transactions to be
>> seen, so these must be treated as bugs [C].
>>
>> If we index the data after committing the transaction, there would be a
>> time window when we could see the updated entries but the index would
>> not reflect that. That might be acceptable limitation if a
>> query-matching misses some entity, but it's also possible that we
>> retrieve the query result key-set and then (after retrieving full
>> entities) we return something that does not match the query. One of the
>> reproducers for ISPN-7806 I've written [1] triggers a situation where
>> listing all Persons could return Animal (different entity type), so I
>> think that there's no validity post-check (though these reproducers
>> don't use transactions).
>>
>> Therefore, I wonder if the index should contain only the key; maybe we
>> should store an unique version and invalidate the query if some of the
>> entries has changed.
>>
>> If we index the data before committing the transaction, similar
>> situation could happen: the index will return keys for entities that
>> will match in the future but the actually returned list will contain
>> stale entities.
>>
>> What's the overall plan? Do we just accept inconsistencies? In that
>> case, please add a verbose statement in docs and point me to that.
>>
>> And if I've misinterpreted something and raised the red flag in error,
>> please let me know.
>>
>> Radim
>>
>> [A] This seems to be a regression after moving towards async
>> interceptors - our impl of
>> org.hibernate.search.backend.TransactionContext is incorrectly bound to
>> TransactionManager. Then we seem to be running out of transaction and
>> are happy to index it right away. The thread that executes the
>> interceptor handler is also dependent on ownership (due to remote
>> LockCommand execution), so I think that it does not fail the local-mode
>> tests.
>>
>> [B] ... and it does so twice as a regression after ISPN-7840 but that's
>> easy to fix.
>>
>> [C] Indexing in prepare command was OK before ISPN-7840 with pessimistic
>> locking which does not send the CommitCommand, but now that the QI has
>> been moved below EWI it means that we're indexing before storing the
>> actual values. Optimistic locking was not correct, though.
>>
>> [1]
>> https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546 
>>
>>
>>
>


-- 
Radim Vansa <rvansa at redhat.com>
JBoss Performance Team



More information about the infinispan-dev mailing list