[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-867) input stream support

Sunday, 28 August 2011

    [
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-867?pag...
] 

Sanne Grinovero commented on HSEARCH-867:
-----------------------------------------

even generally Lucene will use a lot of file handles, and webservers do too. I wonder if
you already have raised the limits? Server/enterprise Linux distributions come
preconfigured with a generous amount, but desktop/developer oriented Linux distributions
usually have an insufficient amount.

Even if you already have a generous kernel limit, you make a good point that this design
does not allow you to control the number of open streams. I think you should not open the
stream initially when creating the LazyField, but you should rather pass enough
information (file path?) to the LaziField implementation to open the reader only when
it's needed, and close it too. So you avoid opening a resource in one thread and
closing it in another, which is generally a bad idea, and also there won't be more
readers open than the amount of workers in the thread_pool.

...
 input stream support
 --------------------

                 Key: HSEARCH-867
                 URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-867
             Project: Hibernate Search
          Issue Type: Improvement
          Components: analyzer, integration
    Affects Versions: 3.4.0.Final
            Reporter: adam
            Priority: Minor

 The current hibernate search functionality is not optimized for dealing with large text
contents.  Two use cases:
 1. indexing an external PDF that's 100MB where an @Field is set on a getter
 2. indexing a @Lob field
 in both cases, the method must return a string, or a base class, which might mean that
you have an InputStream that's 50MB, which gets concatenated into a string, and then
passed to an analyzer bundled into a Reader object.  I'm unclear what HibernateSearch
is doing when the getter for the @Field annotation is called, but it would be ideal if it
could use a reader instead of a string  
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-867) input stream support