Re: [hibernate-dev] Hibernate Search: Adding more "hidden" fields to the index

Thursday, 27 April 2017

...
 I had written the "why" on HSEARCH-2616, but to clarify
here: [...] 
Thanks. So the problem is that we may not be able to update the batch state
upon failure, in which case we would use the less-safe AddLuceneWork upon
restart.
If we had some way to store the information "this partition has started"
*before* we even write to the index, this wouldn't be a problem, but as you
might have guessed JSR-352 doesn't allow that.
So you're right, deleting everything before we even start working is our
best solution. And thus a hidden field will be necessary. I'll continue the
discussion on JIRA.

Yoann Rodière
Hibernate NoORM Team
yoann(a)hibernate.org

On 27 April 2017 at 18:19, Sanne Grinovero <sanne(a)hibernate.org&gt; wrote:

...
 On 27 April 2017 at 15:11, Yoann Rodiere <yoann(a)hibernate.org&gt;
wrote:
 > I wonder, what's the benefit for HSEARCH-2616? Do you want to have that
 > field so that we can just use AddLuceneWorks everywhere, and run targeted
 > delete operations when we start a partition? If so, is it as a fallback
 > solution, if what I proposed cannot be implemented, or as a better
 > alternative? Note I don't have strong arguments against that solution,
 I'm
 > just trying to understand the "why".

 I had written the "why" on HSEARCH-2616, but to clarify here:

 I liked your idea of trying to figure out if the current block of work
 is being repeated, vs it being a re-try. However while I initially
 thought to add such a field as a fallback solution, I believe it's
 ultimately the more robust solution as otherwise you have to trust
 such state, which could be lost / wrong / corrupted independently for
 a number of reasons.
 Since the problem being solved is about resuming the process after a
 problem happened we can't make many safe assumptions about what kind
 of problem we're dealing with; for example if you run out of disk
 space you'll have an half-written index but no way to store such
 batch-state. Other problems might involve indexes being backed up /
 restored / replicated over other technologies (rsync, Infinispan, ..)
 so a mismatch between the index and other state is yet another problem
 which might need caution, logs and possibly tooling.
 Say an IO operation fails during an index write flush: some admin
 intervenes fixing hardware and then triggers resume of indexing.
 In such conditions I wouldn't trust some additional persistent state
 not even if it were cryptographically signed to be correct: corruption
 or signature mismatches could be detected but in this case there's the
 risk of it being trustful but out of date: with IO unavailable when
 this should have been written you're probably reading the previous
 version which had been written. Having an out of date batch state
 would likely have the opposite effect of what we need.

 On the other hand, inspecting what's in the index is coupled with the
 index state so while indexes could be corrupted, the progress tracking
 state and the index being one thing you're not easily fooled.

 Since I agree that having additional fields is not something everyone
 will like, as I suggested on HSEARCH-2616 we could offer the
 alternatives as fallback.

 >
 > On adding a hidden field, I wonder what this will mean for
 Elasticsearch; if
 > we start doing such things, we should clearly and explicitly state in the
 > documentation that targeting existing ES schemas without adapting them to
 > Hibernate Search is not supported.
 > On top of that, it may hurt users upgrading Hibernate Search: Lucene may
 > simply ignore queries against a field that doesn't exist in the index,
 but
 > I'm not sure Elasticsearch behaves that way when the field isn't even
 > defined in the mapping. So users may have to upgrade their schema just
 for
 > that. I know Elasticsearch integration is experimental anyway, but what I
 > mean is if we do that, it must be *before* Elasticsearch we drop the
 > "experimental" mention on Elasticsearch integration.

 Good point. Such proposals to change some internal field don't happen
 very often though.

 We strive to have a stable encoding, but since the index is not the
 database well documented changes might be worth it.
 Especially "private internal" fields should not be too hard to manage
 as we can deal with them explicitly in some lenient way, and if they
 don't contain end user state like in this case we don't even have to
 require an index rebuild.

 For people not wanting this they can have a slower mass indexer, or
 not support recovery.

 Thanks,
 Sanne

 >
 >
 > Yoann Rodière
 > Hibernate NoORM Team
 > yoann(a)hibernate.org
 >
 > On 27 April 2017 at 15:59, Yoann Rodiere <yrodiere(a)redhat.com&gt; wrote:
 >>
 >> I wonder, what's the benefit for HSEARCH-2616? Do you want to have that
 >> field so that we can just use AddLuceneWorks everywhere, and run
 targeted
 >> delete operations when we start a partition? If so, is it as a fallback
 >> solution, if what I proposed cannot be implemented, or as a better
 >> alternative? Note I don't have strong arguments against that solution,
 I'm
 >> just trying to understand the "why".
 >>
 >> On adding a hidden field, I wonder what this will mean for
 Elasticsearch;
 >> if we start doing such things, we should clearly and explicitly state
 in the
 >> documentation that targeting existing ES schemas without adapting them
 to
 >> Hibernate Search is not supported.
 >> On top of that, it may hurt users upgrading Hibernate Search: Lucene may
 >> simply ignore queries against a field that doesn't exist in the index,
 but
 >> I'm not sure Elasticsearch behaves that way when the field isn't even
 >> defined in the mapping. So users may have to upgrade their schema just
 for
 >> that. I know Elasticsearch integration is experimental anyway, but what
 I
 >> mean is if we do that, it must be *before* Elasticsearch we drop the
 >> "experimental" mention on Elasticsearch integration.
 >>
 >>
 >> Yoann Rodière
 >> Software Engineer, Hibernate NoORM Team
 >> Red Hat
 >> yrodiere(a)redhat.com
 >>
 >> On 27 April 2017 at 15:23, Sanne Grinovero <sanne(a)hibernate.org&gt; wrote:
 >>>
 >>> To better implement recovery operations during MassIndexer
 >>> [HSEARCH-2616] - specifically in the context of the upcoming JBatch
 >>> based implementation - I'm considering the benefits of adding one more
 >>> field the the Lucene index for our internal purposes.
 >>>
 >>> This new field is only useful for Hibernate Search internals so we
 >>> shouldn't allow it to be targeted by queries, etc..
 >>>
 >>> There is a single precedent: we already encode the entity name, so
 >>> "hiding fields" is not a new problem that we have to deal with.
It
 >>> might be a reason to polish the existing concept and improve the
 >>> encapsulation.
 >>>
 >>> Would anyone have a strong case against this?
 >>>
 >>> Thanks,
 >>> Sanne
 >>> _______________________________________________
 >>> hibernate-dev mailing list
 >>> hibernate-dev(a)lists.jboss.org
 >>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
 >>
 >>
 >

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [hibernate-dev] Hibernate Search: Adding more "hidden" fields to the index