June 2008 - hibernate-dev - Jboss List Archives

by Andre Haralevi

Dear members of the Hibernate project, we kindly ask for your participation in our survey on security assurance in free/open source software. Security assurances are confidence building activities through structured design processes, documentation, and testing. By participating in our survey you contribute to ongoing research with the aim to make free/open source software more secure. It will not take more than 10 minutes of your valuable time for our 21 questions. Our survey is online for the next two weeks until July 1 at: http://survey.mi.fu-berlin.de/public/survey.php?name=fosssecurity The survey is anonymous. Please find the results of the survey on the project page during July: https://www.inf.fu-berlin.de/w/SE/FOSSSecuritySurvey For further information about Open Source research at the Research Group Software Engineering at Freie Universitaet Berlin, please visit: https://www.inf.fu-berlin.de/w/SE/FOSSHome Thank you in anticipation, Sascha Rasmussen, Alexander Kunze, and Andre Haralevich In case you participate in more than one FOSS project, please fill out the questionnaire for the one where security is most important, or fill out one questionnaire per project. Thank you!

17 years, 6 months

1
0
0 / 0

SQLBase Dialect and Datasource

by Lucas Rosario

17 years, 6 months

1
0
0 / 0

SQLBase Dialect and Datasource

by Lucas Rosario

17 years, 6 months

1
0
0 / 0

[Hibernate Search] Feedback on Document Field lazy loading

by Emmanuel Bernard

I played around the idea of not loading unnecessary fields when loading a Lucene document. It turns out this is not easily possible http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-213 I would appreciate feedback in JIRA on this one. -- Emmanuel Bernard emmanuel(a)hibernate.org http://in.relation.to http://blog.emmanuelbernard.com http://twitter.com/emmanuelbernard

17 years, 6 months

3
7
0 / 0

Search.getFTS as opposed to Search.createFTS

by Emmanuel Bernard

http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-215 To avoid confusion about FullTextSession (which is really a stateless wrapper), I am thinking to rename Search.createFTS: - Search.getFTS - Search.buildFTS Thoughts? -- Emmanuel Bernard emmanuel(a)hibernate.org http://in.relation.to http://blog.emmanuelbernard.com http://twitter.com/emmanuelbernard

17 years, 6 months

1
0
0 / 0

Maven org.hibernate groupId management

by Emmanuel Bernard

When we do an experiment, it might be a good idea to use org.hibernate.experiment or org.hibernate.sandbox as a GroupId to keep org.hibernate somewhat not too unclear nor confusing. What do you think? Especially the fork done by Navin for the Search / Cache integration is a nice candidate before merging back. Emmanuel

17 years, 6 months

2
2
0 / 0

Hudson for Annotations and Hibernate Search

by Hardy Ferentschik

Hi there, I was thinking of setting up an integration build for Annotations and maybe Hibernate Search. So my question really is - who is reposnsible? Can I get access to Hudson myself or do I have to contact someone? Any help appreciated ;-) --Hardy

17 years, 6 months

3
7
0 / 0

About save data into table with hibernate

by tengxh

I use the getHibernateTemplate().saveOrUpdate(obj) to save the data.After i click the create/update button ,i query the table with sql analyse tool immediately,the new data exists,but after a while ,the new data will lost from the table.How can i save the new data into the table forever.

17 years, 6 months

1
0
0 / 0

improving Search

by Sanne Grinovero

Hi all, I've been thinking on some more design changes for Hibernate Search, I hope to start a creative discussion. A)FSMasterDirectoryProvider and Index Snapshots. is currently making copies of it's changing index; Using Lucene's SnapshotDeletionPolicy you don't need to switch from an index to another, it is possible to take snapshots of current index even while it is in use, and copy it to another destination. The main problem I see with this is that the SnapshotDeletion policy must be given to the IndexWriter constructor, so the DirectoryProvider is going to need interaction with the Workspace; Also the DirectoryPRovider would need to know and potentially block the opening of a new IndexWriter; this could currently need many changes but would be easy when solved together to the next proposal: B) stop using IndexReader in write mode, doing all work in a single IndexWriter. There are currently many locks in Search to coordinate the 3 different operations done on indexes: 1- searching 2- deleting 3- inserting The last two are currently a bottleneck, as each transaction needs to lock the index on commit, open an indexreader (expensive), use it (could be expensive), close it (flushing files), open an indexwriter (expensive), close it (flushing again). Only at the end of this sequence the index is unlocked and the next transaction commit can be processed. (Emmanuel, did I get this short explanation right ? Too catastrophic?) We discussed sharing of these elements across different committing transactions, but it isn't easy to manage and benefits are not clear. If we could avoid the need to use an IndexReader to alter the index, the same IndexWriter could be reused, we could just flush changes when needed. The IndexReader appears to be used in two situations: the first if for "purgeAll(class)"; this can easily be replaced as the same functionality is available in IndexWriter. The other use is removal of an entity from the index: The current remove has many //TODO and a good explanations about why an IndexReader is needed; I hope we could resolve this issue, I have two ideas; The problem is Lucene permits removal only by using a single term, but we uniquely identify an entity by using the couple entityName + dbID Solutions: 1) next release of Lucene will provide removal by Query in IndexWriter, we make a BooleanQuery containing both entityName and database ID and it should be solved. 2) We add an additional field containing a unique identifier for entityname+id, use this for removal. This is probably the fastest way for frequently changing indexes but it will waste a lot of space. 3) To always use sharding when we detect multiple entities use the same index. We could define separated indexes for each different type of entities and transparently bridge them together when the query needs multiple types in return, using the usual MultiReaders. This may result in faster queries for single return types but slower queries for polymorphic queries. When using this sort of sharding deletion becomes easy by single term: the database identifier. If we could enforce the use of one of these strategies we could remove most locking; also the IndexWriter may be used concurrently by each thread needing to commit; it would never need to be closed or reopened as it is always aware of the complete index state. As the Document analysis process is done in the IndexWriter it is very important to use it concurrently; enfin having a single indexWriter per index could unlock much more ideas for additional improvements. Integrating with the "massive batch reindexer" from my previous post would be much easier, and it could become the standard indexing process without having to relax any transaction requirement. kind regards, and sorry for being so verbose Sanne

17 years, 7 months

2
5
0 / 0

Hibernate Search: massive batch indexing

by Sanne Grinovero

Hello list, I've finally finished some performance test about stuff I wanted to double-check before writing stupid ideas to this list, so I feel I can at last propose some code to (re)building the index for Hibernate Search. The present API of Hibernate Search provides a nice and safe transactional "index(entity)", but even when trying several optimizations it doesn't reach the speed of an unsafe (out of transaction) indexer we use in our current production environment. Also reading the forum it appears that much people are having difficulties in using the current API, even with a good example in the reference documentation some difficulties arise with Seam's transactions and with huge data sets. (I'm NOT saying something is broken, just that you need a lot of expertise to get it going) SCENARIO ======= * Developers change an entity and want to test the effect on the index structure, thay want do to search experiments with the new fields. * A production system is up(down)graded to a new(old) release, involving index changes. (the system is "down for maintance" but the speed is crucial) * Existing index is corrupted/lost. (Again, speed to recover is critical) * A Database backup is restored, or data is changed by other jobs. * Some crazy developer like me prefers to disable H.Search's event listeners for some reason. (I wouldn't generally recommend it, but have met other people who have a reasonable argument to do this. Also in our case it is a feature as new entered books will be available for loans only from the next day :D) * A Lucene update breaks the index format (not so irrationale as they just did on trunk). PERFORMANCE ======= In simple use cases, such as less than 1000 entities and not too much relationships, the exising API outperforms my prototype, as I have some costly setup. In more massive tests the setup costs are easily recovered by a much faster indexing speed; I have many data I could send, I'll just show some and keep the details simple: entity "Operator": standard complexity, involves loading of +4 objs, 7 field affect index entity "User": moderate complexity, involves loading of +- 20 objs, 12 affect index data entity "Modern": high complexity, loading of 44 entities, many are "manyToMany", 25 affect index data On my laptop (dual core, local MySQL db): type Operator User Modern number 560 100.000 100.000 time-current 0,23 secs 45'' 270.3'' time-new 0,43 secs 30'' 190'' On a staging server (4 core Xeon with lots of ram and dedicated DB server): type Operator User Modern number 560 200.000 4.000.000 time-current 0,09 secs 130'' 5h20' time-new 0,25 secs 22'' 19' [benchmark disclaimer: These timings are meant to be relative to each other for my particular code version, I'm not an expert of Java benchmarking at all. Also unfortunately I can't really access the same hardware for each tests. I used all possible tweaks I am aware of in Hibernate Search, actually enabling new needed params to make the test as fair as possible.] Examining the numbers: with current recommended H.Search strategy I can index 560 simple entities in 0,23 seconds; quite fast and newbe users will be impressed. At the other extreme, we index 4 million complex items, but I need more than 5 hours to do that; this is more like real use case and it could scare several developers. Unfortunately I don't have a complete copy of the DB on my laptop, but looking at the numbers it looks like my laptop could finish in 3 hours, nearly double the speed of our more-than-twice fast server. (yes I've had several memory leaks :-) but they're solved now) The real advantage is the round-trip to database: without multiple threading each lazy loaded collection somehow annotated to be indexed massively slows down the whole process; If you look at both DB an AS servers, they have very low resource usage confirming this, while my laptop stays at 70% cpu (and killing my harddrive) because he has data available locally, producing a constant feed of strings to my index. When using the new prototype (about 20 threads in 4 different pools) I get the 5hours down to less than 20minutes; Also I can start the indexing of all 7 indexable types in parallel and it will stay around 20minutes. The "User" entity is not as complex as Modern (less lazy loaded data) but confirms the same numbers. ISSUES ======= About the current version I've ready. It is not a complete substitute of the current one and is far from perfect; currently these limitations apply but could be easily solved: (others I am not aware of not listed :-) A) I need to "read" some hints for each entity; I tinkered with a new annotation, configuration properties should work but are likely to be quite verbose (HQL); basically I need some hints about fetch strategies appopriate for batch indexing, which are often different than normal use cases. B) Hibernate Search's indexing of related entities was not available when I designed it. I think this change will probably not affect my code, but I still need to verify the functionality of IndexEmbedded. C) It is finetuned for our entities and DB, many variables are configurable but some stuff should be made more flexible. D) Also index sharding didn't exist at the time, I'll need to change some stuff to send the entities to the correct index and acquire the appropriate locks. The next limitations is not easy to solve, I have some ideas but no one I liked. E) It is not completely safe to use it during other data modification; It's not a problem in our current production but needs much warning in case other people wants to use it. The best solution I could think of is to lock the current workqueue of H.Search, so to block execution of work objects in the queue and resume the execution of this work objects after batch indexing is complete. If some entity disappears (removed from DB but a reference is in the queue) it can easily be skipped, if I index "old version" of some other data it will be fixed when scheduled updates from H.S. eventlisteners are resumed; (and the same for new entities). It would be nice to share the same database transaction during the whole process, but as I use several threads and many separate sessions I think this is not possible (this is the best place to ask I think;-) GOING PRACTICAL =============== if (cheater) goto :top A nice evictAll(class) exists, I would like to add indexAll(class). It would be nice to provide non-blocking versions, maybe overloading: indexAll(Class clazz, boolean block) or provide a Future as return object, so people could wait for one or more indexAll requests if they want to. There are many parameters to tweak the indexing process, so I'm not sure if we should put them in the properties, or have a parameters- wrapper object indexAll(Class class, Properties prop), or something like makeIndexer(Class class) returning a complex object with several setters for finetuning and start() and awaitTermination() methods. the easy part -------------- This part is easy to do as I have it working well, it is a pattern involving several executors; the size of each threadPool and of the linking queues between them gives the good balance to achieve the high throughput. First the entities are counted and divided in blocks, these ranges are fed to N scrollables opened in N threads, each thread begins iterating on the list of entities and feeds detached entities to the next Pool using BlockingQueues. In the next pool the entities are re-attached using Lock.none, readonly, etc.. (and many others you may want to tell me) and we get and appropriate DocumentBuilder from the SearchFactory to transform it into a Lucene Document; this pool is usually the slowest as it has to initialize many lazy fields, so there are more threads here. Produced documents go to a smaller pool (best I found was for 2-3 threads) were data is concurrently written to the IndexWriter. There's an additional thread for resource monitoring to produce some hints about queue sizing and idle threads, to do some finetune and to see instant speed reports in logs when enabled. For shutdown I use the "poison pill" pattern, and I usually get rid of all threads and executors when I'm finished. It needs some adaption to take into account of latest Search features such as similarity, but is mostly beta-ready. the difficult part ------------------- Integrating it with the current locking scheme is not really difficult, also because the goal is to minimize downtime so I think some downtime should be acceptable. It would be very nice however integrate this pattern as the default writer for indexes, even "in transaction"; I think it could be possible even in synchronous mode to split the work of a single transaction across the executors and wait for all the work be done at commit. You probably don't want to see the "lots of threads" meant for batch indexing, but the pools scale quite well to adapt themselves to the load, and it's easy (as in clean and maintainable code) to enforce resource limits. When integrating at this level the system wouldn't need to stop regular Search activity. any questions? If someone wanted to reproduce my benchmarks I'll be glad to send my current code and DB. kind regards, Sanne

17 years, 7 months

2
3
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

hibernate-dev June 2008