[
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-361?pag...
]
Erel Magnus commented on HSEARCH-361:
-------------------------------------
I think the solution should accomodate for more situations otherwise some people will flip
over and it might turn up as very expensive thing to fix or to detect as it did for me .
For example considering your solution if the developer adds a class bridge then
immidiately all updates become very expensive and they have no idea why.
There are three options that I can recommend:
First of all , we should consider the possibility that fieldbridges/classbridges will
produce stateful index terms. E.g. When indexing at 1PM it produce A and on 3PM B or some
random value. In that context your solution would not work. The only thing that would work
is like its done in HSEARCH-5
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-5 which is to store a
hash of the fields in the lucene document.
So the first mode would be to index all the fields and to compare the hash. This solution
cosumes some resources during indexing but at least the IndexReader remains
"clean" and the index remains optimized. Reading the data should be quick thanks
to caches.
Secondly, if all fields are stored in the index then we can check that they are the same.
So the second mode would be to simply compare the values. This solution is the most
robust.
Thirdly, assuming there are no stateful index terms (you dont know that) or there are no
custom field bridges (you do know that) you can use what you suggested above .
So the third mode would be to check for dirty fields.
It should also be possible to enable granular behaviour so some fields would use
hash/stored fields and some fields use dirty checks.
I think the setting should be controlled in the @FieldBridge and @ClassBridge and then
passed using the LuceneOptions to the FieldBridge implementation to provide full control
to the developer.
E.g. @FieldBridge( updateMode = UpdateMode.DIRTY , impl = ...)
As for classbridge, there is no simple way for you to know what fields they add and there
is no 1:1 relationship between the annotation and the implementation so I recommend if
there are any class bridges we go for the hash automatically (and not for reindexing).
The default for updateMode must accommodate for backwards compatibility.
I suggest there should be a configuration entry variable for the default and it should be
set to "auto" , or any of the other options mentioned above.
If set to "auto":
1. If no custom fieldbridges/classbridges are used, go for the dirty check strategy.
2. If there are some custom field bridges, then go for the hash. Note this would cost a
reindexing of the old index because the hash field needs to be added for the first time,
but at least it works right and once the hash is there it works.
3. If all fields are stored, go for the comparison.
So for me I would then install the new hibernate , and without doing any
coding/configuration there should be much less index updates. So out of the box the
solution has good value.
After this I would just configure dirty checking explictly by changing all the
@FieldBridge annotation since I don't have any stateful fields - and get a bit more
CPU performance. I dont use class bridges so that would do it.
Regards,
Erel Magnus
Only index an entity if an indexed property has changed
-------------------------------------------------------
Key: HSEARCH-361
URL:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-361
Project: Hibernate Search
Issue Type: New Feature
Reporter: Emmanuel Bernard
Fix For: 3.2.0
PostUpdateEvent has state and oldState
and EntityPersister has getPropertyNames() and findDirty()
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://opensource.atlassian.com/projects/hibernate/secure/Administrators....
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira