]
Sanne Grinovero commented on HSEARCH-390:
-----------------------------------------
should we add a parameter to define the charset? That way we can "default to the
default" charset in case the new parameter is missing.
Do you have a patch for this? BTW I believe you can select the default charset as a JVM
parameter.
HibernateSearchResourceLoader uses default charset for reading
resources
------------------------------------------------------------------------
Key: HSEARCH-390
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-390
Project: Hibernate Search
Issue Type: Bug
Components: analyzer
Affects Versions: 3.1.1.GA
Reporter: Ivan Holub
HibernateSearchResourceLoader uses default charset for reading resources.
So stop words are not working for other languages.
@AnalyzerDef(name="ru",
tokenizer=(a)TokenizerDef(factory=StandardTokenizerFactory.class),
filters={
@TokenFilterDef(factory=StandardFilterFactory.class),
@TokenFilterDef(factory=LowerCaseFilterFactory.class),
@TokenFilterDef(factory=StopFilterFactory.class,
params=@Parameter(name="words",
value="stopwords/stopwords_ru.txt")),
@TokenFilterDef(factory=SnowballPorterFilterFactory.class,
params=@Parameter(name="language",
value="Russian"))
stopwords/stopwords_ru.txt is UTF-8 file
To fix the problem I constructed Analyzer in separate class and without using
AnalyzerDef.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: