[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-867) input stream support

Sanne Grinovero (JIRA) noreply at atlassian.com
Sun Aug 28 12:42:02 EDT 2011


    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=43366#comment-43366 ] 

Sanne Grinovero commented on HSEARCH-867:
-----------------------------------------

even generally Lucene will use a lot of file handles, and webservers do too. I wonder if you already have raised the limits? Server/enterprise Linux distributions come preconfigured with a generous amount, but desktop/developer oriented Linux distributions usually have an insufficient amount.

Even if you already have a generous kernel limit, you make a good point that this design does not allow you to control the number of open streams. I think you should not open the stream initially when creating the LazyField, but you should rather pass enough information (file path?) to the LaziField implementation to open the reader only when it's needed, and close it too. So you avoid opening a resource in one thread and closing it in another, which is generally a bad idea, and also there won't be more readers open than the amount of workers in the thread_pool.

> input stream support
> --------------------
>
>                 Key: HSEARCH-867
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-867
>             Project: Hibernate Search
>          Issue Type: Improvement
>          Components: analyzer, integration
>    Affects Versions: 3.4.0.Final
>            Reporter: adam
>            Priority: Minor
>
> The current hibernate search functionality is not optimized for dealing with large text contents.  Two use cases:
> 1. indexing an external PDF that's 100MB where an @Field is set on a getter
> 2. indexing a @Lob field
> in both cases, the method must return a string, or a base class, which might mean that you have an InputStream that's 50MB, which gets concatenated into a string, and then passed to an analyzer bundled into a Reader object.  I'm unclear what HibernateSearch is doing when the getter for the @Field annotation is called, but it would be ideal if it could use a reader instead of a string 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the hibernate-issues mailing list