[hibernate-issues] [Hibernate-JIRA] Commented: (HSEARCH-361) Only index an entity if an indexed property has changed

Erel Magnus (JIRA) noreply at atlassian.com
Fri Apr 17 11:51:17 EDT 2009


    [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=32913#action_32913 ] 

Erel Magnus commented on HSEARCH-361:
-------------------------------------

I think the solution should accomodate for more situations otherwise some people will flip over and it might turn up as very expensive thing to fix or to detect as it did for me . For example considering your solution if the developer adds a class bridge then immidiately all updates become very expensive and they have no idea why.

There are three options that I can recommend:

First of all , we should consider the possibility that fieldbridges/classbridges will produce stateful index terms. E.g. When indexing at 1PM it produce A and on 3PM B or some random value. In that context your solution would not work. The only thing that would work is like its done in  HSEARCH-5 http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-5 which is to store a hash of the fields in the lucene document.
So the first mode would be to index all the fields and to compare the hash. This solution cosumes some resources during indexing but at least the IndexReader remains "clean" and the index remains optimized. Reading the data should be quick thanks to caches.

Secondly, if all fields are stored in the index then we can check that they are the same.
So the second mode would be to simply compare the values. This solution is the most robust.

Thirdly, assuming there are no stateful index terms (you dont know that) or there are no custom field bridges  (you do know that) you can use what you suggested above .
So the third mode would be to check for dirty fields.

It should also be possible to enable granular behaviour so some fields would use hash/stored fields and some fields use dirty checks.

I think the setting should be controlled in the @FieldBridge and @ClassBridge and then passed using the LuceneOptions to the FieldBridge implementation to provide full control to the developer.

E.g. @FieldBridge( updateMode = UpdateMode.DIRTY , impl = ...)

As for classbridge, there is no simple way for you to know what fields they add and there is no 1:1 relationship between the annotation and the implementation so I recommend if there are any class bridges we go for the hash automatically (and not for reindexing).

The default for updateMode must accommodate for backwards compatibility.
I suggest there should be a configuration entry variable for the default and it should be set to "auto" , or any of the other options mentioned above.
If set to "auto":
1. If no custom fieldbridges/classbridges are used, go for the dirty check strategy.
2. If there are some custom field bridges, then go for the hash. Note this would cost a reindexing of the old index because the hash field needs to be added for the first time, but at least it works right and once the hash is there it works.
3. If all fields are stored, go for the comparison.

So for me I would then install the new hibernate , and without doing any coding/configuration there should be much less index updates. So out of the box the solution has good value.
After this I would just configure dirty checking explictly by changing all the @FieldBridge annotation since I don't have any stateful fields - and get a bit more CPU performance. I dont use class bridges so that would do it.

Regards,

Erel Magnus




> Only index an entity if an indexed property has changed
> -------------------------------------------------------
>
>                 Key: HSEARCH-361
>                 URL: http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-361
>             Project: Hibernate Search
>          Issue Type: New Feature
>            Reporter: Emmanuel Bernard
>             Fix For: 3.2.0
>
>
> PostUpdateEvent has state and oldState
> and EntityPersister has getPropertyNames() and findDirty()

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://opensource.atlassian.com/projects/hibernate/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the hibernate-issues mailing list