[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-469) Filter caching using causes excessive memory use

Mon Apr 12 14:22:59 EDT 2010

    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=36396#action_36396 ] 

Dobes Vandermeer commented on HSEARCH-469:
------------------------------------------

Hi Sanne,

It's off-topic, yes.  I think the most of the needed improvements are already in the system but not yet implemented or released.  At the same time, our use of lucene is simple enough that it's easier to reimplement search using lucene directly than to patch hibernate-search.  Hope that makes sense ... especially I am changing the system to store indexes in the database, and one set of indexes per tenant in our multi-tenant system.  Each tenant has a relatively small set of records to search and searching is never done across tenants, so this reduces our memory footprint on a search considerably.  Storing the indexes in the DB reduces IT complexity for backups, database clones, transactions, and unit tests and although performance may be slightly impacted, it seems worth the price in simplifying a bunch of other things.

> Filter caching using causes excessive memory use
> ------------------------------------------------
>
>                 Key: HSEARCH-469
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-469
>             Project: Hibernate Search
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 3.1.1.GA
>         Environment: hibernate-core-3.3.2.GA
>            Reporter: Dobes Vandermeer
>            Assignee: Sanne Grinovero
>             Fix For: 3.2.0
>
>
> The CachingWrapperFilter uses the reader instance (CacheableMultiReader) as a key for the caching.
> However, the reader instance keeps pointers to byte arrays in its "normsCache" and in the "normsCache" of its sub-readers; each array has one byte for each document in the index and in some cases there will be multiple of these arrays associated with differet fields.
> For an index with millions of records this can result in an apparent "leak" of hundreds of megabytes of memory as those readers are not re-used and the MRU cache used by default will keep up to 128 hard references to the readers by default.
> The search system must either re-use or delete the normsCache, OR the cache key for these filters should be tied to something else that doesn't keep references to potentially huge data arrays.  Otherwise the scalability of the search subsystem is significantly impacted when using filters, as you must have enough heap to accommodate up to 128 times as many copies of the norms arrays.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira