With the Lucene backend, we don't have any idea of what dynamic fields have been added to the index before the last restart of the application; we just know of dynamic fields that have been mentioned by the user (during indexing/search) since the last restart. When we build an exists predicate for an object field, what we do internally is building a boolean query with should clauses, where each clauses tests if a "leaf" field exists. When there are dynamic fields, we don't know the full list of leaf fields, and thus we cannot properly build the exists predicate: the dynamic fields are ignored. Solution 1: persisted metamodel The most obvious solution would be to persist a list of indexed dynamic fields somewhere, and read that list on bootstrap. In short, introduce a persisted metamodel for the Lucene backend. I'm not a fan of this approach because of the added complexity for just one single feature. Solution 2: relaxed exists() matching rules A perhaps easier solution would be to relax the exists() matching rules, and declare that exists() matches an object field if it was non-null when indexing. Basically:
- For nested object fields we would just run a MatchAllDocs() query within the join: if there is a nested document, the field exists.
- For flattened object fields we would have to store the list of object fields added to a given document in a specific field, and query that field. I suppose there would be an overhead at indexing time, but we already do that for other field types; see the uses of org.hibernate.search.backend.lucene.lowlevel.common.impl.MetadataFields#fieldNamesFieldName().
As an added benefit, this would immediately solve HSEARCH-3904 Open (take into account dynamic fields in exists() predicate on object fields) for the Lucene backend. The main drawback is that the behavior would be different from that of Elasticsearch, which only matches object fields when they have at least one non-null non-object child. But in a way, isn't that just a limitation of Elasticsearch? |