[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-917) Add match all terms option when matching in the DSL API

Guillaume Smet (JIRA) noreply at atlassian.com
Mon Apr 16 05:44:50 EDT 2012


    [ https://hibernate.onjira.com/browse/HSEARCH-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=46299#comment-46299 ] 

Guillaume Smet commented on HSEARCH-917:
----------------------------------------

Hi Sanne,

The Hibernate Search part is easy - and I already have it somewhere - but the Lucene one isn't that easy, depending on your analyzers.

I had to put it on hold for the time being as I have to concentrate my efforts on another project. I'll be back on it as soon as I've finished my other project (should be end of june) if noone beats me to it.

-- 
Guillaume

> Add match all terms option when matching in the DSL API
> -------------------------------------------------------
>
>                 Key: HSEARCH-917
>                 URL: https://hibernate.onjira.com/browse/HSEARCH-917
>             Project: Hibernate Search
>          Issue Type: Improvement
>          Components: query
>            Reporter: Guillaume Smet
>            Assignee: Guillaume Smet
>            Priority: Critical
>
> Hi,
> A bit of context:
> We are early adopters of Hibernate Search and we have very few problems with it (except the @IndexEmbedded problem we helped to fix in 3.4.1, no problem so far).
> When the DSL API was introduced, I tried it and I found the problem I describe below. I decided to use the QueryParser API (and the MultiFieldQueryParser API) as a workaround. The fact is that:
> * we use Hibernate Search in every application we have, now;
> * the DSL API is really nice and, as we introduced QueryDSL in our application, we now use a lot of DSL like API and I would like to be able to use Hibernate Search API too;
> * I thought it was a deliberate choice but, recently I found an example so weird, I can't think it's the wanted behaviour.
> So this problem isn't new and it exists since the first version of the DSL API.
> Now, the description of the problem:
> * we use the following analyzer to index a field in our entity:
> {code}
> @AnalyzerDef(name = HibernateSearchAnalyzer.TEXT,
> 			tokenizer = @TokenizerDef(factory = WhitespaceTokenizerFactory.class),
> 			filters = {
> 					@TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
> 					@TokenFilterDef(factory = WordDelimiterFilterFactory.class, params = {
> 									@org.hibernate.search.annotations.Parameter(name = "generateWordParts", value = "1"),
> 									@org.hibernate.search.annotations.Parameter(name = "generateNumberParts", value = "1"),
> 									@org.hibernate.search.annotations.Parameter(name = "catenateWords", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "catenateNumbers", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "catenateAll", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "splitOnCaseChange", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "splitOnNumerics", value = "0"),
> 									@org.hibernate.search.annotations.Parameter(name = "preserveOriginal", value = "1")
> 							}
> 					),
> 					@TokenFilterDef(factory = LowerCaseFilterFactory.class)
> 			}
> 	),
> {code}
> * the content of the field is something like XXXX-AAAA-HAGYU-19910
> * if you search for an exact match "XXXX-AAAA-HAGYU-19910" with the QueryParser, you have a few results: namely the results which have all the different parts (XXXX, AAAA, HAGYU and 19910) in any order. That's the behaviour I expect considering my analyzer.
> * if you search using the DSL API, you have ALL the results containing at least ONE token so A LOT of results in our case.
> My expectation is that the DSL API should work as the Lucene parser works and it should return the same results.
> The problem is that in ConnectedMultiFieldsTermQueryBuilder, we don't use the QueryParser to build the Lucene query but a getAllTermsFromText() method which uses the analyzer to get all the terms and from that we build a OR query.
> So when I search for XXXX-AAAA-HAGYU-19910, the DSL API searches for "XXXX" OR "AAAA" OR "HAGYU" OR "19910".
> I really think it's a mistake and that we should use the *QueryParser API to build the Lucene Query and have the correct behaviour.
> If needed, I can provide any further information and/or a test case. I just want to be sure you consider it a bug before working further on this. Otherwise I'll stick to using the *QueryParser API.
> Thanks for your feedback.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the hibernate-issues mailing list