[hibernate-dev] [HSEARCH] Usefulness of index sharing

Sanne Grinovero sanne at hibernate.org
Thu Aug 13 08:26:46 EDT 2015


On 13 August 2015 at 08:33, Gunnar Morling <gunnar at hibernate.org> wrote:
> Hi,
>
> 2015-08-12 17:46 GMT+02:00 Sanne Grinovero <sanne at hibernate.org>:
>> That's an interesting proposal, as index sharing inherently implies
>> that fields on different types shall not have conflicting mapping
>> (i.e. don't reuse the same field name for a different type).
>>
>> By default we don't share indexes across unrelated types, but also *by
>> default* subtypes are indexed in the same index as their parent - if
>> the parent is indexed as well.
>
> Yes, I think that's the case where it makes sense. It'd make sense to
> re-phrase the docs in that regard.
>
>>
>> The reason is to efficiently map a polymorphic domain: when people
>> search for type X, they implicitly also search for its subtypes as
>> these are valid candidates for the query.
>> Having them all in the same index makes for better result quality and
>> better search performance - as joining multiple IndexReaders to
>> perform a cross - index Query is generally a bad idea, as it's then
>> hard to accurately normalize statistics across different vector
>> spaces, and that's what defines the quality of the search result.
>> At least I believe that *generally* that would give you better
>> results, but that's why we give options, and also why sometimes people
>> might want multiple Domain objects to be stored in the same index:
>> they might be "subtypes" from a domain perspective even if they don't
>> technically use inheritance at the Java level: they might be different
>> types and yet be mapped to some common fields with (hopefully)
>> compatible indexing options.
>
> Have you ever seen this as an actual requirement by someone?

Yes, not least by myself :)
You might have various types which don't share a Java inheritance tree
but still have some common property. Could be a simple tagging system,
or just the classical example of "title" of a product.
Some people will have a Product parent class, some people might not
have love for expressing their model in a Java inheritance straight
jacket.. a real world large information system seldom follows the
Animal examples of text books.

Consider also that you might not want to *search* for these different
types, but still index them together. E.g. do some computation like
what's the most frequently used tag across various types, or implement
an auto-suggester field for a UI in which the exact target domain type
is yet to be filled in by some follow-up step.

So while I agree it doesn't seem a great idea to run a query which
could return multiple different (and unrelated - other than by
inheritance from Object), there are many other cases; even a
mixed-type search is not too hard to handle when using a Projection.

>> If we were to drop index sharing, then I think it should be fair to
>> also not support multiple types as target for a query anymore; as I'm
>> assuming in this case you'd only share for subtypes of some common
>> parent, and you'd target that common parent exclusively to perform a
>> polymorphic query.
>
> Assuming we'd drop index sharing for unrelated types but would
> continue to support it for the types of one inheritance hierarchy, one
> still might want results only from a sub-set of the hierarchy's types.
>
>>
>> So that's the reasons for which it exists; there are some good reasons
>> to not allow it too: as you mention the filtering, but also the very
>> fact that the type information has to be stored in form of classname
>> (typename, in free-form).
>
> Interestingly, that's not so much an issue with ES. There you always
> add a "type" discriminator.

Right, any discriminator is quite cheap with Lucene. Just trying to
think which benefits it would have, but it's clear I think we need to
stick with it.

>> I think the strongest reason to not allow it is to avoid the
>> inconsistent field mappings, but we could compensate for that with
>> better schema validation - something which seems is getting more
>> necessary anyway.
>
> Yes, that' help. All in all, index sharing for inheritance hierarchies
> makes sense to me, but I am doubtful about sharing between unrelated
> types.

I'll assume the above examples changed your mind ;)

Cheers,
Sanne

>
>>
>> I didn't mean to kill the proposal :) just hoping it helps figure out
>> why someone might need it. Would be nice to think of alternatives out
>> of the box to avoid the filtering.
>>
>> Sanne
>
> --Gunnar
>
>>
>>
>>
>> On 12 August 2015 at 15:30, Gunnar Morling <gunnar at hibernate.org> wrote:
>>> Hibernate Search aficionados,
>>>
>>> I am wondering what that's the rationale for offering the feature of
>>> index sharing [1].
>>>
>>> The ref guide says "there is really not much benefit in sharing
>>> indexes". It complicates queries, as an additional filter on the type
>>> field must be applied in case of targeting only one entity using a
>>> shared index.
>>>
>>> Should we consider to drop this feature in HS 6?
>>>
>>> Thanks,
>>>
>>> --Gunnar
>>>
>>> [1] https://docs.jboss.org/hibernate/search/5.4/reference/en-US/html_single/#section-sharing-indexes
>>> _______________________________________________
>>> hibernate-dev mailing list
>>> hibernate-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev


More information about the hibernate-dev mailing list