[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-477) Support for the new Solr's character filters

Thu Apr 1 16:33:31 EDT 2010

    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=36074#action_36074 ] 

G Fernandes commented on HSEARCH-477:
-------------------------------------

CharFilters sit between the Reader and the Tokenizers [1], thus they are supposed to filter the stream produced by the reader before the tokenization. 
For an illustration of how the CharFilters are used in Solr, please refer to [2]

[1] http://issues.apache.org/jira/browse/LUCENE-1466 
[2] http://issues.apache.org/jira/browse/SOLR-822

The order of application would be first the charFilters in their declaration order, and then all the tokenFilters also in their own order. Probably the @AnalyzerDef is better represented this way:

{code}
public @interface AnalyzerDef {
	String name();
        +CharFilterDef[] charFilters() default { };	
        TokenizerDef tokenizer();
	TokenFilterDef[] filters() default { };
}
{code}

Thoughts?

> Support for the new Solr's character filters
> --------------------------------------------
>
>                 Key: HSEARCH-477
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-477
>             Project: Hibernate Search
>          Issue Type: Improvement
>          Components: analyzer
>            Reporter: Gustavo Fernandes
>            Priority: Minor
>         Attachments: solr14-2.patch
>
>
> Solr 1.4 introduced CharacterFilters [1], which are based on Lucene's CharStream. Those filters are currently incompatible with the annotation @TokenFilterDef, which accept only TokenFilterFactories:
> {code}
> public @interface TokenFilterDef {
> 	public abstract Class<? extends TokenFilterFactory> factory();
> 	public abstract Parameter[] params() default { };
> {code}
> Onde ideia is to keep the same annotation, "generalize" the token filter factory type in the annotation, and on SolrAnalyzerBuilder construct a TokenizerChain which will accept both type of filters [2]
> [1] http://lucene.apache.org/solr/api/org/apache/solr/analysis/CharFilterFactory.html
> [2] http://lucene.apache.org/solr/api/org/apache/solr/analysis/TokenizerChain.html

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira