[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-867) input stream support

adam (JIRA) noreply at atlassian.com
Wed Aug 24 18:11:06 EDT 2011


    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=43347#comment-43347 ] 

adam commented on HSEARCH-867:
------------------------------

Sanne,
  thanks for your comments. I'm trying to optimize an issue locally where we will be pulling in multiple files (sometimes large). Hence, trying to avoid string concatenation due to the memory issue. What I've found is that the Fieldable class (according to the documentation) should happily work with a reader if it's given one instead of the String. Documenting what I did (as this get's indexed in google)

# changing the stringValue to return null and implementing a reader works well
# using a SequenceInputStream allows me to wrap the FileInputStreams into a single reader
# changing the FieldBridge to process a reader and pass it to the LazyField


> input stream support
> --------------------
>
>                 Key: HSEARCH-867
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-867
>             Project: Hibernate Search
>          Issue Type: Improvement
>          Components: analyzer, integration
>    Affects Versions: 3.4.0.Final
>            Reporter: adam
>            Priority: Minor
>
> The current hibernate search functionality is not optimized for dealing with large text contents.  Two use cases:
> 1. indexing an external PDF that's 100MB where an @Field is set on a getter
> 2. indexing a @Lob field
> in both cases, the method must return a string, or a base class, which might mean that you have an InputStream that's 50MB, which gets concatenated into a string, and then passed to an analyzer bundled into a Reader object.  I'm unclear what HibernateSearch is doing when the getter for the @Field annotation is called, but it would be ideal if it could use a reader instead of a string 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the hibernate-issues mailing list