]
Emmanuel Bernard updated HSEARCH-390:
-------------------------------------
Assignee: Emmanuel Bernard
Fix Version/s: (was: 3.3.0)
3.3.0.CR1
Issue Type: New Feature (was: Bug)
Summary: Allow customization of the charset used by analyzer components (was:
HibernateSearchResourceLoader uses default charset for reading resources)
fixed with an ad-hoc param
{code}@Parameter(name="resource_charset", value"UTF-8"){code}
Allow customization of the charset used by analyzer components
--------------------------------------------------------------
Key: HSEARCH-390
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-390
Project: Hibernate Search
Issue Type: New Feature
Components: analyzer
Affects Versions: 3.1.1.GA
Reporter: Ivan Holub
Assignee: Emmanuel Bernard
Fix For: 3.3.0.CR1
HibernateSearchResourceLoader uses default charset for reading resources.
So stop words are not working for other languages.
@AnalyzerDef(name="ru",
tokenizer=(a)TokenizerDef(factory=StandardTokenizerFactory.class),
filters={
@TokenFilterDef(factory=StandardFilterFactory.class),
@TokenFilterDef(factory=LowerCaseFilterFactory.class),
@TokenFilterDef(factory=StopFilterFactory.class,
params=@Parameter(name="words",
value="stopwords/stopwords_ru.txt")),
@TokenFilterDef(factory=SnowballPorterFilterFactory.class,
params=@Parameter(name="language",
value="Russian"))
stopwords/stopwords_ru.txt is UTF-8 file
To fix the problem I constructed Analyzer in separate class and without using
AnalyzerDef.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: