]
Sanne Grinovero updated HSEARCH-473:
------------------------------------
Assignee: Sanne Grinovero
Fix Version/s: 3.2.0.Beta2
Fields for _hibernate_class and the document ID are hard-coded to be
analyzed and have "norms" enabled
------------------------------------------------------------------------------------------------------
Key: HSEARCH-473
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-473
Project: Hibernate Search
Issue Type: Bug
Components: engine
Affects Versions: 3.1.1.GA
Reporter: Dobes Vandermeer
Assignee: Sanne Grinovero
Fix For: 3.2.0.Beta2
A little-known problem with lucene is that it allocates one or more bytes of memory per
document in the index when "norms" are enabled for one or more fields involved
in a query. These norms are used for sorting/ranking but not necessarily for search.
Currently hibernate search uses special field fields for "id" and
"_hibernate_class" with norms *enabled*, which means these norms arrays are
created whenever hibernate search deletes something from the index as it uses those fields
to find the related document. This results in unnecessary memory use (many megabytes for
an index with millions of records in it).
Although this can be worked around by simply allocating a larger heap to the JVM it can
become quite a significant issue if you plan to support hundreds of simultaneous users on
a DB with millions of records; any user action that triggers an entity deletion may cause
the norms array to be created, so you may have to allocate hundreds of megabytes of heap
just to allow for the creation of these unnecessary norms arrays. This may still be
mitigated by using asynchronous search index updates so that there's a fixed number of
threads processing the deletions, I haven't confirmed whether that is the case or
not.
The definition of these fields is hard-coded and they do not pay any attention to any
@Field or @FieldBridge annotation on the id of the entity:
{code:java|title=org.hibernate.search.engine.DocumentBuilderIndexedEntity.getDocument(T,
Serializable, Map<String, String>)}
Field classField =
new Field(
CLASS_FIELDNAME,
entityType.getName(),
Field.Store.YES,
Field.Index.NOT_ANALYZED,
Field.TermVector.NO
);
doc.add( classField );
// now add the entity id to the document
LuceneOptions luceneOptions = new LuceneOptionsImpl(
Field.Store.YES,
Field.Index.NOT_ANALYZED, Field.TermVector.NO, idBoost
);
idBridge.set( idKeywordName, id, doc, luceneOptions );
{code}
The fix is relatively straightforward, you simply have to use
Field.Index.NO_NORMS_NOT_ANALYZED instead of Field.Index.NOT_ANALYZED for both these
fields.
Related to
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-469
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: