[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-867) input stream support

Wed Aug 24 04:05:06 EDT 2011

    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=43330#comment-43330 ] 

Sanne Grinovero commented on HSEARCH-867:
-----------------------------------------

Hi Adam,
there's this old article about lazy field extraction, I'm not sure how far it can still be applied to recent versions of Hibernate Search; it would be great if you could try it out, and then let us know if we need to change something or you're welcome to fix the article (it's a wiki).

http://community.jboss.org/wiki/HibernateSearchAndOfflineTextExtraction

> input stream support
> --------------------
>
>                 Key: HSEARCH-867
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-867
>             Project: Hibernate Search
>          Issue Type: Improvement
>          Components: analyzer, integration
>    Affects Versions: 3.4.0.Final
>            Reporter: adam
>            Priority: Minor
>
> The current hibernate search functionality is not optimized for dealing with large text contents.  Two use cases:
> 1. indexing an external PDF that's 100MB where an @Field is set on a getter
> 2. indexing a @Lob field
> in both cases, the method must return a string, or a base class, which might mean that you have an InputStream that's 50MB, which gets concatenated into a string, and then passed to an analyzer bundled into a Reader object.  I'm unclear what HibernateSearch is doing when the getter for the @Field annotation is called, but it would be ideal if it could use a reader instead of a string 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira