[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-917) DSL API doesn't build the correct Lucene query
Emmanuel Bernard (JIRA)
noreply at atlassian.com
Mon Jan 2 08:22:19 EST 2012
[ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=44819#comment-44819 ]
Emmanuel Bernard commented on HSEARCH-917:
------------------------------------------
There are two main reasons I have decided to go for the OR option:
- if you use OR, you also retrieve results matching AND and usually the AND results are on top
- to support tokenizers like NGRAMs, you must use OR and that's the main reason behind my choice.
But I'm happy to add a new setting if needed.
{code}
Query query = builder
.keyword()
.onField( "uuid" )
.matching( "XXXX-AAAA-HAGYU-19910" )
.mustMatchAllTerms() //or a better name
.createQuery();
{code}
WDYT?
> DSL API doesn't build the correct Lucene query
> ----------------------------------------------
>
> Key: HSEARCH-917
> URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-917
> Project: Hibernate Search
> Issue Type: Bug
> Components: query
> Affects Versions: 3.4.1.Final
> Reporter: Guillaume Smet
> Priority: Critical
>
> Hi,
> A bit of context:
> We are early adopters of Hibernate Search and we have very few problems with it (except the @IndexEmbedded problem we helped to fix in 3.4.1, no problem so far).
> When the DSL API was introduced, I tried it and I found the problem I describe below. I decided to use the QueryParser API (and the MultiFieldQueryParser API) as a workaround. The fact is that:
> * we use Hibernate Search in every application we have, now;
> * the DSL API is really nice and, as we introduced QueryDSL in our application, we now use a lot of DSL like API and I would like to be able to use Hibernate Search API too;
> * I thought it was a deliberate choice but, recently I found an example so weird, I can't think it's the wanted behaviour.
> So this problem isn't new and it exists since the first version of the DSL API.
> Now, the description of the problem:
> * we use the following analyzer to index a field in our entity:
> {code}
> @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT,
> tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
> filters = {
> @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
> @TokenFilterDef(factory = WordDelimiterFilterFactory.class, params = {
> @org.hibernate.search.annotations.Parameter(name = "generateWordParts", value = "1"),
> @org.hibernate.search.annotations.Parameter(name = "generateNumberParts", value = "1"),
> @org.hibernate.search.annotations.Parameter(name = "catenateWords", value = "0"),
> @org.hibernate.search.annotations.Parameter(name = "catenateNumbers", value = "0"),
> @org.hibernate.search.annotations.Parameter(name = "catenateAll", value = "0"),
> @org.hibernate.search.annotations.Parameter(name = "splitOnCaseChange", value = "0"),
> @org.hibernate.search.annotations.Parameter(name = "splitOnNumerics", value = "0"),
> @org.hibernate.search.annotations.Parameter(name = "preserveOriginal", value = "1")
> }
> ),
> @TokenFilterDef(factory = LowerCaseFilterFactory.class)
> }
> ),
> {code}
> * the content of the field is something like XXXX-AAAA-HAGYU-19910
> * if you search for an exact match "XXXX-AAAA-HAGYU-19910" with the QueryParser, you have a few results: namely the results which have all the different parts (XXXX, AAAA, HAGYU and 19910) in any order. That's the behaviour I expect considering my analyzer.
> * if you search using the DSL API, you have ALL the results containing at least ONE token so A LOT of results in our case.
> My expectation is that the DSL API should work as the Lucene parser works and it should return the same results.
> The problem is that in ConnectedMultiFieldsTermQueryBuilder, we don't use the QueryParser to build the Lucene query but a getAllTermsFromText() method which uses the analyzer to get all the terms and from that we build a OR query.
> So when I search for XXXX-AAAA-HAGYU-19910, the DSL API searches for "XXXX" OR "AAAA" OR "HAGYU" OR "19910".
> I really think it's a mistake and that we should use the *QueryParser API to build the Lucene Query and have the correct behaviour.
> If needed, I can provide any further information and/or a test case. I just want to be sure you consider it a bug before working further on this. Otherwise I'll stick to using the *QueryParser API.
> Thanks for your feedback.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the hibernate-issues
mailing list