On 13 August 2015 at 08:33, Gunnar Morling <gunnar(a)hibernate.org> wrote:
2015-08-12 17:46 GMT+02:00 Sanne Grinovero <sanne(a)hibernate.org>:
> That's an interesting proposal, as index sharing inherently implies
> that fields on different types shall not have conflicting mapping
> (i.e. don't reuse the same field name for a different type).
> By default we don't share indexes across unrelated types, but also *by
> default* subtypes are indexed in the same index as their parent - if
> the parent is indexed as well.
Yes, I think that's the case where it makes sense. It'd make sense to
re-phrase the docs in that regard.
> The reason is to efficiently map a polymorphic domain: when people
> search for type X, they implicitly also search for its subtypes as
> these are valid candidates for the query.
> Having them all in the same index makes for better result quality and
> better search performance - as joining multiple IndexReaders to
> perform a cross - index Query is generally a bad idea, as it's then
> hard to accurately normalize statistics across different vector
> spaces, and that's what defines the quality of the search result.
> At least I believe that *generally* that would give you better
> results, but that's why we give options, and also why sometimes people
> might want multiple Domain objects to be stored in the same index:
> they might be "subtypes" from a domain perspective even if they don't
> technically use inheritance at the Java level: they might be different
> types and yet be mapped to some common fields with (hopefully)
> compatible indexing options.
Have you ever seen this as an actual requirement by someone?
Yes, not least by myself :)
You might have various types which don't share a Java inheritance tree
but still have some common property. Could be a simple tagging system,
or just the classical example of "title" of a product.
Some people will have a Product parent class, some people might not
have love for expressing their model in a Java inheritance straight
jacket.. a real world large information system seldom follows the
Animal examples of text books.
Consider also that you might not want to *search* for these different
types, but still index them together. E.g. do some computation like
what's the most frequently used tag across various types, or implement
an auto-suggester field for a UI in which the exact target domain type
is yet to be filled in by some follow-up step.
So while I agree it doesn't seem a great idea to run a query which
could return multiple different (and unrelated - other than by
inheritance from Object), there are many other cases; even a
mixed-type search is not too hard to handle when using a Projection.
> If we were to drop index sharing, then I think it should be fair
> also not support multiple types as target for a query anymore; as I'm
> assuming in this case you'd only share for subtypes of some common
> parent, and you'd target that common parent exclusively to perform a
> polymorphic query.
Assuming we'd drop index sharing for unrelated types but would
continue to support it for the types of one inheritance hierarchy, one
still might want results only from a sub-set of the hierarchy's types.
> So that's the reasons for which it exists; there are some good reasons
> to not allow it too: as you mention the filtering, but also the very
> fact that the type information has to be stored in form of classname
> (typename, in free-form).
Interestingly, that's not so much an issue with ES. There you always
add a "type" discriminator.
Right, any discriminator is quite cheap with Lucene. Just trying to
think which benefits it would have, but it's clear I think we need to
stick with it.
> I think the strongest reason to not allow it is to avoid the
> inconsistent field mappings, but we could compensate for that with
> better schema validation - something which seems is getting more
> necessary anyway.
Yes, that' help. All in all, index sharing for inheritance hierarchies
makes sense to me, but I am doubtful about sharing between unrelated
I'll assume the above examples changed your mind ;)
> I didn't mean to kill the proposal :) just hoping it helps figure out
> why someone might need it. Would be nice to think of alternatives out
> of the box to avoid the filtering.
> On 12 August 2015 at 15:30, Gunnar Morling <gunnar(a)hibernate.org> wrote:
>> Hibernate Search aficionados,
>> I am wondering what that's the rationale for offering the feature of
>> index sharing .
>> The ref guide says "there is really not much benefit in sharing
>> indexes". It complicates queries, as an additional filter on the type
>> field must be applied in case of targeting only one entity using a
>> shared index.
>> Should we consider to drop this feature in HS 6?
>> hibernate-dev mailing list
> hibernate-dev mailing list