[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-960) Index.UN_TOKENIZED overrides other tokenized fields that share the same name

Hardy Ferentschik (JIRA) noreply at atlassian.com
Thu Nov 10 11:35:19 EST 2011


    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=44291#comment-44291 ] 

Hardy Ferentschik commented on HSEARCH-960:
-------------------------------------------

After reviewing some code and digging a little deeper I am not so sure that we actually can do something. The problem is not so much on the Hibernate Search side, than on the Lucene API.

On the Search side we actually keep the metadata per field and use the right option when building the document. See _assertValuesAreIndexedWithDifferentAnalyzeSettings_ of this [test|https://github.com/hferentschik/hibernate-search/blob/a0352df5845c18b5508bfcdd1ff499aea0cb4e06/hibernate-search-orm/src/test/java/org/hibernate/search/test/configuration/field/TokenizationTest.java]

The problem is that the analyzing step does not occur at the time the _Document_ is built, but when we add the document is added to the index (see eg _AddWorkDelegate_ - {code}writer.addDocument( work.getDocument(), analyzer ){code})

Analyzers work per field name and we have our own implementation _ScopedAnalyzer_. Depending on a field name it returns an analyzer. In the case of non analyzed field we use _PassThroughAnalyzer_. To implement the described use case our _ScopedAnalyzer_ would have to return in one case the _PassThroughAnalyzer_ and in the other the _StandardAnalyzer_ for the same field name. There is not enough information to make the distinction.

I think we are better of logging a warning or throwing an exception. Or does anyone have a better idea?

> Index.UN_TOKENIZED overrides other tokenized fields that share the same name
> ----------------------------------------------------------------------------
>
>                 Key: HSEARCH-960
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-960
>             Project: Hibernate Search
>          Issue Type: Bug
>          Components: mapping
>    Affects Versions: 3.4.0.Final
>         Environment: 3.4.0 Final
>            Reporter: John-Michael Au
>            Assignee: Hardy Ferentschik
>              Labels: annotations, bug, override, tokenized, un_tokenized
>             Fix For: 3.4.2, 4.0.0.CR2
>
>
> Marking one field as un-tokenized causes all other fields with the same names to be un-tokenized.
> i.e.
> {code}
> @Field(name = "simple_search", index = Index.UN_TOKENIZED, store = Store.NO)
> private String string;
> @Field(name = "simple_search", index = Index.TOKENIZED, store = Store.NO)
> private String string2;
> {code}
> The resulting behaviour is that "simple_search" will be made up of un-tokenized 'string' and 'string2' values, even though 'string2' was specified to be tokenized.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the hibernate-issues mailing list