[hibernate-dev] [HSEARCH] About HSEARCH-917 or DSL API and query parser

Guillaume Smet guillaume.smet at gmail.com
Wed Dec 28 08:59:20 EST 2011


I opened https://hibernate.onjira.com/browse/HSEARCH-917 a few months
ago about something that really prevents us from using the DSL API in
a lot of cases.

I explained why in the JIRA issue. While I'm OK to do the ground work
of coding something and adding tests for it, I think it might be a
good idea to discuss it before.

Basically, the problem is that when I search for XXXX-AAAA-HAGYU-19910
using an analyzer with the WordDelimiterFilterFactory filter, the DSL
API searches for "XXXX" OR "AAAA" OR "HAGYU" OR "19910" (yes, OR). In
this case, the Lucene QueryParser is designed to look for "XXXX" AND
"AAAA" AND "HAGYU" AND "19910".
The underlying problem is that in
ConnectedMultiFieldsTermQueryBuilder, we don't use the QueryParser to
build the Lucene query but a getAllTermsFromText() method which uses
the analyzer to get all the terms and from that it builds a OR query.

You can also observe the problem if you search for more than one word.

I think it's plain wrong in the case of a standard search and it
should be fixed by using the Lucene QueryParser to build the query.

The only problem I see with using the Lucene query parser is that it
doesn't pass the text through the analyzer for a fuzzy or wildcard
search (but we have a special case for wildcard so I think it's
already working this way with the current code). I'm not sure it's
really a problem, considering that it's a well known Lucene behaviour.
But it's probably why it's done that way (maybe someone with the
history can explain why it's done that way).

It would be cool to discuss this problem and find an acceptable solution.

Have a nice day.


More information about the hibernate-dev mailing list