[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-917) DSL API doesn't build the correct Lucene query

Mon Jan 2 08:22:19 EST 2012

    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=44819#comment-44819 ] 

Emmanuel Bernard commented on HSEARCH-917:
------------------------------------------

There are two main reasons I have decided to go for the OR option:

- if you use OR, you also retrieve results matching AND and usually the AND results are on top
- to support tokenizers like NGRAMs, you must use OR and that's the main reason behind my choice.

But I'm happy to add a new setting if needed.

{code}
Query query = builder
    .keyword()
        .onField( "uuid" )
        .matching( "XXXX-AAAA-HAGYU-19910" )
            .mustMatchAllTerms() //or a better name
        .createQuery();
{code}

WDYT?

> DSL API doesn't build the correct Lucene query
> ----------------------------------------------
>
>                 Key: HSEARCH-917
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-917
>             Project: Hibernate Search
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 3.4.1.Final
>            Reporter: Guillaume Smet
>            Priority: Critical
>
> Hi,
> A bit of context:
> We are early adopters of Hibernate Search and we have very few problems with it (except the @IndexEmbedded problem we helped to fix in 3.4.1, no problem so far).
> When the DSL API was introduced, I tried it and I found the problem I describe below. I decided to use the QueryParser API (and the MultiFieldQueryParser API) as a workaround. The fact is that:
> * we use Hibernate Search in every application we have, now;
> * the DSL API is really nice and, as we introduced QueryDSL in our application, we now use a lot of DSL like API and I would like to be able to use Hibernate Search API too;
> * I thought it was a deliberate choice but, recently I found an example so weird, I can't think it's the wanted behaviour.
> So this problem isn't new and it exists since the first version of the DSL API.
> Now, the description of the problem:
> * we use the following analyzer to index a field in our entity:
> {code}
> @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT,
> 			tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
> 			filters = {
> 					@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
> 					@TokenFilterDef(factory = WordDelimiterFilterFactory.class, params = {
> 									@org.hibernate.search.annotations.Parameter(name = "generateWordParts", value = "1"),
> 									@org.hibernate.search.annotations.Parameter(name = "generateNumberParts", value = "1"),
> 									@org.hibernate.search.annotations.Parameter(name = "catenateWords", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "catenateNumbers", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "catenateAll", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "splitOnCaseChange", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "splitOnNumerics", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "preserveOriginal", value = "1")
> 							}
> 					),
> 					@TokenFilterDef(factory = LowerCaseFilterFactory.class)
> 			}
> 	),
> {code}
> * the content of the field is something like XXXX-AAAA-HAGYU-19910
> * if you search for an exact match "XXXX-AAAA-HAGYU-19910" with the QueryParser, you have a few results: namely the results which have all the different parts (XXXX, AAAA, HAGYU and 19910) in any order. That's the behaviour I expect considering my analyzer.
> * if you search using the DSL API, you have ALL the results containing at least ONE token so A LOT of results in our case.
> My expectation is that the DSL API should work as the Lucene parser works and it should return the same results.
> The problem is that in ConnectedMultiFieldsTermQueryBuilder, we don't use the QueryParser to build the Lucene query but a getAllTermsFromText() method which uses the analyzer to get all the terms and from that we build a OR query.
> So when I search for XXXX-AAAA-HAGYU-19910, the DSL API searches for "XXXX" OR "AAAA" OR "HAGYU" OR "19910".
> I really think it's a mistake and that we should use the *QueryParser API to build the Lucene Query and have the correct behaviour.
> If needed, I can provide any further information and/or a test case. I just want to be sure you consider it a bug before working further on this. Otherwise I'll stick to using the *QueryParser API.
> Thanks for your feedback.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira