[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-960) Index.UN_TOKENIZED overrides other tokenized fields that share the same name

Thursday, 10 November 2011

    [
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-960?pag...
] 

Hardy Ferentschik commented on HSEARCH-960:
-------------------------------------------

After reviewing some code and digging a little deeper I am not so sure that we actually
can do something. The problem is not so much on the Hibernate Search side, than on the
Lucene API.

On the Search side we actually keep the metadata per field and use the right option when
building the document. See _assertValuesAreIndexedWithDifferentAnalyzeSettings_ of this
[test|https://github.com/hferentschik/hibernate-search/blob/a0352df5845c18...]

The problem is that the analyzing step does not occur at the time the _Document_ is built,
but when we add the document is added to the index (see eg _AddWorkDelegate_ -
{code}writer.addDocument( work.getDocument(), analyzer ){code})

Analyzers work per field name and we have our own implementation _ScopedAnalyzer_.
Depending on a field name it returns an analyzer. In the case of non analyzed field we use
_PassThroughAnalyzer_. To implement the described use case our _ScopedAnalyzer_ would have
to return in one case the _PassThroughAnalyzer_ and in the other the _StandardAnalyzer_
for the same field name. There is not enough information to make the distinction.

I think we are better of logging a warning or throwing an exception. Or does anyone have a
better idea?

...
 Index.UN_TOKENIZED overrides other tokenized fields that share the
same name
 ----------------------------------------------------------------------------

                 Key: HSEARCH-960
                 URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-960
             Project: Hibernate Search
          Issue Type: Bug
          Components: mapping
    Affects Versions: 3.4.0.Final
         Environment: 3.4.0 Final
            Reporter: John-Michael Au
            Assignee: Hardy Ferentschik
              Labels: annotations, bug, override, tokenized, un_tokenized
             Fix For: 3.4.2, 4.0.0.CR2

 Marking one field as un-tokenized causes all other fields with the same names to be
un-tokenized.
 i.e.
 {code}
 @Field(name = "simple_search", index = Index.UN_TOKENIZED, store = Store.NO)
 private String string;
 @Field(name = "simple_search", index = Index.TOKENIZED, store = Store.NO)
 private String string2;
 {code}
 The resulting behaviour is that "simple_search" will be made up of un-tokenized
'string' and 'string2' values, even though 'string2' was specified
to be tokenized. 
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-960) Index.UN_TOKENIZED overrides other tokenized fields that share the same name