[hibernate-issues] [Hibernate-JIRA] Updated: (HSEARCH-390) Allow customization of the charset used by analyzer components

Emmanuel Bernard (JIRA) noreply at atlassian.com
Sat Nov 6 12:00:13 EDT 2010


     [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Bernard updated HSEARCH-390:
-------------------------------------

         Assignee: Emmanuel Bernard
    Fix Version/s:     (was: 3.3.0)
                   3.3.0.CR1
       Issue Type: New Feature  (was: Bug)
          Summary: Allow customization of the charset used by analyzer components  (was: HibernateSearchResourceLoader uses default charset for reading resources)

fixed with an ad-hoc param
{code}@Parameter(name="resource_charset", value"UTF-8"){code}

> Allow customization of the charset used by analyzer components
> --------------------------------------------------------------
>
>                 Key: HSEARCH-390
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-390
>             Project: Hibernate Search
>          Issue Type: New Feature
>          Components: analyzer
>    Affects Versions: 3.1.1.GA
>            Reporter: Ivan Holub
>            Assignee: Emmanuel Bernard
>             Fix For: 3.3.0.CR1
>
>
> HibernateSearchResourceLoader uses default charset for reading resources.
> So stop words are not working for other languages.
> 	@AnalyzerDef(name="ru",
> 				 tokenizer=@TokenizerDef(factory=StandardTokenizerFactory.class),
> 				 filters={
> 					@TokenFilterDef(factory=StandardFilterFactory.class),
> 					@TokenFilterDef(factory=LowerCaseFilterFactory.class),
> 					@TokenFilterDef(factory=StopFilterFactory.class, 
> 									params=@Parameter(name="words",
> 													  value="stopwords/stopwords_ru.txt")),
> 				    @TokenFilterDef(factory=SnowballPorterFilterFactory.class,
> 								    params=@Parameter(name="language",
> 							                          value="Russian"))
> stopwords/stopwords_ru.txt is UTF-8 file
> To fix the problem I constructed Analyzer in separate class and without using AnalyzerDef.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the hibernate-issues mailing list