| Thanks for the report and the test cases, now I see what you meant. I pushed your patches to a fork of the repo for future reference: https://github.com/yrodiere/hibernate-test-case-templates/tree/HSEARCH-3534/ Now, the problem. If I understand correctly, the Elasticsearch team decided it would be a good idea for the boolean junctions to behave differently when they are nested under a filter/must_not clause than when they are not:
- If the bool query is in a query context and has a must or filter clause then a document will match the bool query even if none of the should queries match. In this case these clauses are only used to influence the score.
- If the bool query is a filter context or has neither must or filter then at least one of the should queries must match a document for it to match the bool query. This behavior may be explicitly controlled by settings the minimum_should_match parameter.
Source: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/query-dsl-bool-query.html This effectively means that minimum_should_match defaults to 0 in the first case, and to 1 in the second case. The thing is, this is completely arbitrary and not something we have in Lucene at all. I can see three solutions:
- We change the Lucene backend to implement the same behavior. That might be a bit difficult to achieve, in particular when the user doesn't rely on our DSL. But more importantly, that will be surprising to people already familiar with Lucene.
- We change the Elasticsearch backend to work around these defaults and force Lucene's defaults instead. This will be surprising to people already familiar with Elasticsearch.
- We don't change anything, and simply document this oddity.
1 seems dodgy, but option 2 seems more reasonable. And a few tests show that it's possible. Let's try to do it, at least in 6. |