[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-960) Index.UN_TOKENIZED overrides other tokenized fields that share the same name

Wednesday, 26 October 2011

    [
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-960?pag...
] 

Sanne Grinovero commented on HSEARCH-960:
-----------------------------------------

Hi, thank you for the explanation.
Still you might have a problem: let's say you have an entity with the two fields
annotated as in your description, having the values "ID5" and "Title of
this book", and you use an Analyzer which includes lowercasing and removal of digits.
Now during the query, you'll have to apply the same analyzer - which one? If you
process the user input with the same analyzer your users will never be able to find
"ID5", while if you don't process them they will need to type exactly
"Title of this book", matching the same case, same whitespace and all terms in
exactly the same order.

So in both cases you'll have elements in the index which can't be found and
can't possibly produce hits - you would be better off with not indexing them at all.

Indexing multiple properties in the same field is indeed useful for the reasons as you
say, but I'm afraid that to be useful they should all use the same Analyzer and Index
options.

I think we should throw an exception or log a warning - unless someone can make a good
example of a valid combination of a mismatched set of options.

Also.. why do you have users search for ID fields?

...
 Index.UN_TOKENIZED overrides other tokenized fields that share the
same name
 ----------------------------------------------------------------------------

                 Key: HSEARCH-960
                 URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-960
             Project: Hibernate Search
          Issue Type: Bug
          Components: mapping
    Affects Versions: 3.4.0.Final
         Environment: 3.4.0 Final
            Reporter: John-Michael Au
              Labels: annotations, bug, override, tokenized, un_tokenized
             Fix For: 3.4.2, 4.0.0.CR2

 Marking one field as un-tokenized causes all other fields with the same names to be
un-tokenized.
 i.e.
 {code}
 @Field(name = "simple_search", index = Index.UN_TOKENIZED, store = Store.NO)
 private String string;
 @Field(name = "simple_search", index = Index.TOKENIZED, store = Store.NO)
 private String string2;
 {code}
 The resulting behaviour is that "simple_search" will be made up of un-tokenized
'string' and 'string2' values, even though 'string2' was specified
to be tokenized. 
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-960) Index.UN_TOKENIZED overrides other tokenized fields that share the same name