[hibernate-dev] Hibernate Search: Adding more "hidden" fields to the index

Thu Apr 27 12:19:39 EDT 2017

On 27 April 2017 at 15:11, Yoann Rodiere <yoann at hibernate.org> wrote:
> I wonder, what's the benefit for HSEARCH-2616? Do you want to have that
> field so that we can just use AddLuceneWorks everywhere, and run targeted
> delete operations when we start a partition? If so, is it as a fallback
> solution, if what I proposed cannot be implemented, or as a better
> alternative? Note I don't have strong arguments against that solution, I'm
> just trying to understand the "why".

I had written the "why" on HSEARCH-2616, but to clarify here:

I liked your idea of trying to figure out if the current block of work
is being repeated, vs it being a re-try. However while I initially
thought to add such a field as a fallback solution, I believe it's
ultimately the more robust solution as otherwise you have to trust
such state, which could be lost / wrong / corrupted independently for
a number of reasons.
Since the problem being solved is about resuming the process after a
problem happened we can't make many safe assumptions about what kind
of problem we're dealing with; for example if you run out of disk
space you'll have an half-written index but no way to store such
batch-state. Other problems might involve indexes being backed up /
restored / replicated over other technologies (rsync, Infinispan, ..)
so a mismatch between the index and other state is yet another problem
which might need caution, logs and possibly tooling.
Say an IO operation fails during an index write flush: some admin
intervenes fixing hardware and then triggers resume of indexing.
In such conditions I wouldn't trust some additional persistent state
not even if it were cryptographically signed to be correct: corruption
or signature mismatches could be detected but in this case there's the
risk of it being trustful but out of date: with IO unavailable when
this should have been written you're probably reading the previous
version which had been written. Having an out of date batch state
would likely have the opposite effect of what we need.

On the other hand, inspecting what's in the index is coupled with the
index state so while indexes could be corrupted, the progress tracking
state and the index being one thing you're not easily fooled.

Since I agree that having additional fields is not something everyone
will like, as I suggested on HSEARCH-2616 we could offer the
alternatives as fallback.

>
> On adding a hidden field, I wonder what this will mean for Elasticsearch; if
> we start doing such things, we should clearly and explicitly state in the
> documentation that targeting existing ES schemas without adapting them to
> Hibernate Search is not supported.
> On top of that, it may hurt users upgrading Hibernate Search: Lucene may
> simply ignore queries against a field that doesn't exist in the index, but
> I'm not sure Elasticsearch behaves that way when the field isn't even
> defined in the mapping. So users may have to upgrade their schema just for
> that. I know Elasticsearch integration is experimental anyway, but what I
> mean is if we do that, it must be *before* Elasticsearch we drop the
> "experimental" mention on Elasticsearch integration.

Good point. Such proposals to change some internal field don't happen
very often though.

We strive to have a stable encoding, but since the index is not the
database well documented changes might be worth it.
Especially "private internal" fields should not be too hard to manage
as we can deal with them explicitly in some lenient way, and if they
don't contain end user state like in this case we don't even have to
require an index rebuild.

For people not wanting this they can have a slower mass indexer, or
not support recovery.

Thanks,
Sanne

>
>
> Yoann Rodière
> Hibernate NoORM Team
> yoann at hibernate.org
>
> On 27 April 2017 at 15:59, Yoann Rodiere <yrodiere at redhat.com> wrote:
>>
>> I wonder, what's the benefit for HSEARCH-2616? Do you want to have that
>> field so that we can just use AddLuceneWorks everywhere, and run targeted
>> delete operations when we start a partition? If so, is it as a fallback
>> solution, if what I proposed cannot be implemented, or as a better
>> alternative? Note I don't have strong arguments against that solution, I'm
>> just trying to understand the "why".
>>
>> On adding a hidden field, I wonder what this will mean for Elasticsearch;
>> if we start doing such things, we should clearly and explicitly state in the
>> documentation that targeting existing ES schemas without adapting them to
>> Hibernate Search is not supported.
>> On top of that, it may hurt users upgrading Hibernate Search: Lucene may
>> simply ignore queries against a field that doesn't exist in the index, but
>> I'm not sure Elasticsearch behaves that way when the field isn't even
>> defined in the mapping. So users may have to upgrade their schema just for
>> that. I know Elasticsearch integration is experimental anyway, but what I
>> mean is if we do that, it must be *before* Elasticsearch we drop the
>> "experimental" mention on Elasticsearch integration.
>>
>>
>> Yoann Rodière
>> Software Engineer, Hibernate NoORM Team
>> Red Hat
>> yrodiere at redhat.com
>>
>> On 27 April 2017 at 15:23, Sanne Grinovero <sanne at hibernate.org> wrote:
>>>
>>> To better implement recovery operations during MassIndexer
>>> [HSEARCH-2616] - specifically in the context of the upcoming JBatch
>>> based implementation - I'm considering the benefits of adding one more
>>> field the the Lucene index for our internal purposes.
>>>
>>> This new field is only useful for Hibernate Search internals so we
>>> shouldn't allow it to be targeted by queries, etc..
>>>
>>> There is a single precedent: we already encode the entity name, so
>>> "hiding fields" is not a new problem that we have to deal with. It
>>> might be a reason to polish the existing concept and improve the
>>> encapsulation.
>>>
>>> Would anyone have a strong case against this?
>>>
>>> Thanks,
>>> Sanne
>>> _______________________________________________
>>> hibernate-dev mailing list
>>> hibernate-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>
>>
>