[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-917) DSL API doesn't build the correct Lucene query

Wednesday, 4 January 2012

    [
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-917?pag...
] 

Guillaume Smet commented on HSEARCH-917:
----------------------------------------

bq. Do you have other use cases where AND is better by the way?

Well, in all our applications, we mostly use AND (using the Lucene MultiFieldQueryParser
directly instead of the DSL API).

Probably because I use dismax a lot on our Solr powered search engines. I then use
"minimum should match" to allow for 1 or 2 missing words but having most of them
in the results. AND is more similar to this approach.

bq. and will show up in the IDE autocompletion just the same

Not exactly. With option 2, if you do match<TAB>, you have both choices proposed in
the IDE autocompletion. That's why I prefer this option.

But I don't have a strong opinion about it.

I'll try to work on a patch soon with option 1 in mind. We can change the syntax later
if we find something better.

-- 
Guillaume

...
 DSL API doesn't build the correct Lucene query
 ----------------------------------------------

                 Key: HSEARCH-917
                 URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-917
             Project: Hibernate Search
          Issue Type: Bug
          Components: query
    Affects Versions: 3.4.1.Final
            Reporter: Guillaume Smet
            Priority: Critical

 Hi,
 A bit of context:
 We are early adopters of Hibernate Search and we have very few problems with it (except
the @IndexEmbedded problem we helped to fix in 3.4.1, no problem so far).
 When the DSL API was introduced, I tried it and I found the problem I describe below. I
decided to use the QueryParser API (and the MultiFieldQueryParser API) as a workaround.
The fact is that:
 * we use Hibernate Search in every application we have, now;
 * the DSL API is really nice and, as we introduced QueryDSL in our application, we now
use a lot of DSL like API and I would like to be able to use Hibernate Search API too;
 * I thought it was a deliberate choice but, recently I found an example so weird, I
can't think it's the wanted behaviour.
 So this problem isn't new and it exists since the first version of the DSL API.
 Now, the description of the problem:
 * we use the following analyzer to index a field in our entity:
 {code}
 @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT,
 			tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
 			filters = {
 					@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
 					@TokenFilterDef(factory = WordDelimiterFilterFactory.class, params = {
 									@org.hibernate.search.annotations.Parameter(name =
"generateWordParts", value = "1"),
 									@org.hibernate.search.annotations.Parameter(name =
"generateNumberParts", value = "1"),
 									@org.hibernate.search.annotations.Parameter(name = "catenateWords",
value = "0"),
 									@org.hibernate.search.annotations.Parameter(name = "catenateNumbers",
value = "0"),
 									@org.hibernate.search.annotations.Parameter(name = "catenateAll",
value = "0"),
 									@org.hibernate.search.annotations.Parameter(name =
"splitOnCaseChange", value = "0"),
 									@org.hibernate.search.annotations.Parameter(name = "splitOnNumerics",
value = "0"),
 									@org.hibernate.search.annotations.Parameter(name = "preserveOriginal",
value = "1")
 							}
 					),
 					@TokenFilterDef(factory = LowerCaseFilterFactory.class)
 			}
 	),
 {code}
 * the content of the field is something like XXXX-AAAA-HAGYU-19910
 * if you search for an exact match "XXXX-AAAA-HAGYU-19910" with the
QueryParser, you have a few results: namely the results which have all the different parts
(XXXX, AAAA, HAGYU and 19910) in any order. That's the behaviour I expect considering
my analyzer.
 * if you search using the DSL API, you have ALL the results containing at least ONE token
so A LOT of results in our case.
 My expectation is that the DSL API should work as the Lucene parser works and it should
return the same results.
 The problem is that in ConnectedMultiFieldsTermQueryBuilder, we don't use the
QueryParser to build the Lucene query but a getAllTermsFromText() method which uses the
analyzer to get all the terms and from that we build a OR query.
 So when I search for XXXX-AAAA-HAGYU-19910, the DSL API searches for "XXXX" OR
"AAAA" OR "HAGYU" OR "19910".
 I really think it's a mistake and that we should use the *QueryParser API to build
the Lucene Query and have the correct behaviour.
 If needed, I can provide any further information and/or a test case. I just want to be
sure you consider it a bug before working further on this. Otherwise I'll stick to
using the *QueryParser API.
 Thanks for your feedback. 
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-917) DSL API doesn't build the correct Lucene query