July 2013 - hibernate-dev - Jboss List Archives

Hibernate Search Metadata API: Numeric and other special Field types in Hibernate Search

by Sanne Grinovero

The new FieldSettingsDescriptor [1] has a couple of methods meant for Numeric fields: /** * @return the numeric precision step in case this field is indexed as a numeric value. If the field is not numeric * {@code null} is returned. */ Integer precisionStep(); /** * @return {@code true} if this field is indexed as numeric field, {@code false} otherwise * * @see #precisionStep() */ boolean isNumeric(); Today we have specific support for the org.apache.lucene.document.NumericField type from Lucene, so these are reasonable (and needed to build queries) but this specific kind is being replaced by a more general purpose encoding so that you don't have "just" NumericField but can have a wide range of special fields. So today for simplicity it would make sense to expose these methods directly on the FieldSettingsDescriptor as it makes sense for our users, but then also the #isNumeric() is needed as not all fields are numeric: we're having these extra methods to accommodate for the needs of some special cases. Considering that we might get more "special cases" with Lucene4, and that probably they will have different options, would we be able to both decouple from these specific options and also expose the needed precisionStep ? I won't mention my favorite Vattern. I've considered adding subtypes but not liking it as their usage would not be clear from the API. Cheers, Sanne 1 - as merged two minutes ago

11 years, 4 months

4
5
0 / 0

is it valid that having an attribute in orm.xml which doesn't exist in the pojo?

by Strong Liu

hibernate-core/src/test/resources/org/hibernate/test/annotations/xml/ejb3/orm.xml this orm.xml has : <basic name="unknownProperty"/> but this attribute doesn't exist in the pojo, what should we do in this case? BTW the old metamodel / configuration seems just ignore it ------------------------- Best Regards, Strong Liu <stliu at hibernate.org> http://about.me/stliu/bio

11 years, 4 months

2
2
0 / 0

Design: HSEARCH-1032 MassIndexer with a live update mechanism

by Sanne Grinovero

Current priorities on Search are: - Infinispan IndexManager -> me - Metadata API -> Hardy - Multitenancy (aka dynamic Sharding) -> me + Emmanuel + Dimitrios Those are all important as they represent hard requirements for other projects, but I'd also like to consider at least the basic design for how the MassIndexer could operate in "update mode": a highly requested mode in which it re-synchronizes the index with the database but without wiping out the index, which creates a window in time of the application in which results are not complete. # Reminder on current design: 1- deletes the current index 2- scrolls on all entities and uses ADD index operations to add them all again There are two basic approaches on the table (other ideas welcome) : - #A Use UPDATE index operations instead, skipping the initial delete - #B Rebuild the index in a secondary directory, then switch Let's explore them: #A Use UPDATE index operations instead, skipping the initial delete ## what Technically an UPDATE operation is - in Lucene terms - an atomic (delete+add); the benefit is that each query will either see the previous document or the updated one, there is no possibility that the doc is skipped as there is no possibility to flush the changes between the delete and the add operation. ## performance The reason the current design deletes all elements at the start of the process, is that this is a very efficient operation: it targets a single term (the class name field) or in some cases targets the whole index, so just needs to delete all segments files. When doing a delete operation on a per-document base, instead of a class, that very likely needs a deletion on multiple terms (which is not efficient at all as it needs to IO to seek across multiple disk positions), and of course the worse point is that it triggers a delete operation for each and every entity. To compare, a single ADD doesn't need any disk seek as we can pack multiple operations in one - until buffer is full - but any single delete requires N disk seeks (N is not directly the number of fields but is proportional to it). Based on this, and on experience with the #index() method benchmarking, I'm expecting the UPDATE strategy to be approximately a thousand times slower than the current MassIndexer implementation.. considering for some it takes a couple of hours, going to 2000 hours is maybe not an option :-) (that's 3 months) ## left over entries Another problem is that if we scroll on all entities from the database, we're failing to delete documents in the index for which there is no match anymore. So we would need a final phase in which we run the inverse iteration: for each element in the index, verify if there is a match in the database; sounds like an ugly lot of queries, even if we batch it in verification blocks. bottomline, looks messy. #B Rebuild the index in a secondary directory, then switch ## performance No big concerns, but we assume there is enough space for at least four times the size of the index (because we normally need twice to be able to compact one, and we have two to manage). ## design The good part is that we can reuse most of the existing MassIndexer; but transactional changes (those applied by the application during a reindexing) need to be redirected to both the indexes: the one being used until the rebuild is complete so that the queries stay consistent, and also enqueued into the one being built so that they don't get lost in case they apply to documents which have already been indexed. The queue handling is tricky, because in such case further additions actually need to be updates, unless we can keep them on hold in a buffer to be applied on the pristine index: could take quite some memory, depending on the amount of changes flying in during the massindexing. If the queue grows beyond reason we'll need to either apply backpressure on the transactions or offload to disk or change to an update strategy for the remaining massindexing process.. none of these are desirable but I guess people could tune to make this condition unlikely. ## SPI changes With this design we need to be able to: - dynamically instantiate a second Directory in a different path - switch to delegate writes to both directories / one directory - control from where Readers are opened - make sure closed Readers go back to the original pool where they come from as their reference source could have been changed - be able to switch (permanently) to a different active index - destroy old index I'm afraid each of these can affect our SPIs; likely at least IndexManager. I hope we can have all the logic in "behind the scenes" code which drives the same SPIs as of today but I'd need a POC to verify this. ## Directory index path If we switch from one Directory to another - thinking about the FSDirectory - we're either violating the path configuration options from the user or we need to move the new index into the configured position when done. If the above sounds a bit complex, I'm actually more concerned about implementing such an atomic move on the filesystem. I guess we could agree that if the user configured an index to be in - say - "/var/lucene/persons" we could store the indexes in "/var/lucene/persons/index-a" and "/var/lucene/persons/index-b", alternating in similar way to the FSMasterDirectoryProvider, but that takes away some control on index position and is not backwards compatible. Would this be acceptable? # Timeline This might need to be moved to 5.0 because of the various backwards compatibility concerns - ideally if some community user feels to participate we could share some early code in experimental branches and work together. Comments and better ideas welcome :) Sanne

11 years, 4 months

3
2
0 / 0

[OGM] Embedded MongoDB for tests

by Gunnar Morling

Hi all, I just came across across "EmbedMongo" [1] which provides a way to run MongoDB embedded within an application. This is e.g. convenient for tests as it doesn't require a separately installed MongoDB instance. I've tried it out with a single test and it worked as expected. Unfortunately MongoDB (the server) can't be retrieved as Maven dependency, EmbedMongo thus retrieves the distribution via HTTP and stores it in ~/.embedmongo/. This only happens once during the first usage. What do you think, would that be helpful to be used for the OGM MongoDB tests (it might well be that this or similar options have been discussed before and I just missed that)? --Gunnar [1] https://github.com/flapdoodle-oss/embedmongo.flapdoodle.de

11 years, 4 months

5
23
0 / 0

Marking API members as incubating

by Gunnar Morling

Hi, Hardy and I have been musing about how to mark new API members (methods, classes etc.) which are still incubating or experimental. Of course we have Alpha, Beta releases etc. but there can be cases where it makes sense to ship new functionality with a final release and still leave the door open for refinements in the next release based on user feedback. So basically we're looking for a way to inform the user and say "it's ok to use this API, but be prepared to changes in the future". One way to do this is documentation, i.e. prose or a custom JavaDoc tag such as @experimental. This has been done in HSEARCH before. Alternatively a Java 5 annotation could be used which I'd personally find advantageous for the following reasons: * With an annotation, the generated JavaDoc gives you a list with all incubating members out of the box, see e.g. the Guava docs for an example [1]. * For an annotation we can provide proper documentation in form of JavaDoc, i.e. the user of the API can inspect the docs of @Incubating from within the IDE and learn about the rules behind it. For a tag, a user would only see the specific comment of a given instance. * An annotation is more tool friendly, e.g. a user could easily find all references to @Incubating in her IDE or even write an annotation processor or a custom CheckStyle rule issuing a build warning when using an incubating member Such an annotation would have a retention level of SOURCE similar to other documenting annotations such as @Generated. Any thoughts? --Gunnar [1] http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/...

11 years, 4 months

5
18
0 / 0

Sybase BLOB loading errors

by Sanne Grinovero

We have the below error reported since quite a while in the Hibernate Search testsuite, when run on Sybase. I remember when initially noticing that someone told me it was about a known problem in ORM, but I didn't track the JIRA issue. Someone knows please? TiA Sanne Error Message The method com.sybase.jdbc4.jdbc.SybCursorResultSet.getBlob(String) is not supported and should not be called. Stacktrace java.lang.UnsupportedOperationException: The method com.sybase.jdbc4.jdbc.SybCursorResultSet.getBlob(String) is not supported and should not be called. at com.sybase.jdbc4.jdbc.ErrorMessage.raiseRuntimeException(Unknown Source) at com.sybase.jdbc4.utils.Debug.notSupported(Unknown Source) at com.sybase.jdbc4.jdbc.SybResultSet.getBlob(Unknown Source) at org.hibernate.type.descriptor.sql.BlobTypeDescriptor$1.doExtract(BlobTypeDescriptor.java:64) at org.hibernate.type.descriptor.sql.BasicExtractor.extract(BasicExtractor.java:64) at org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:261) at org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:257) at org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:247) at org.hibernate.type.AbstractStandardBasicType.hydrate(AbstractStandardBasicType.java:332) at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2912) at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1673) at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1605) at org.hibernate.loader.Loader.getRow(Loader.java:1505) at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:713) at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:683) at org.hibernate.loader.Loader.loadSingleRow(Loader.java:379) at org.hibernate.internal.ScrollableResultsImpl.prepareCurrentRow(ScrollableResultsImpl.java:240) at org.hibernate.internal.ScrollableResultsImpl.next(ScrollableResultsImpl.java:117) at org.hibernate.search.test.bridge.tika.TikaBridgeBlobSupportTest.indexBook(TikaBridgeBlobSupportTest.java:128) at org.hibernate.search.test.bridge.tika.TikaBridgeBlobSupportTest.testDefaultTikaBridgeWithBlobData(TikaBridgeBlobSupportTest.java:74)

11 years, 4 months

3
2
0 / 0

Hibernate ORM OSGi in InfoQ

by Brett Meyer

If anyone's interested in the recent Hibernate ORM OSGi efforts, there's a write-up in InfoQ today: http://www.infoq.com/news/2013/07/hibernate-osgi Brett Meyer Red Hat Software Engineer, Hibernate

11 years, 4 months

2
1
0 / 0

IRC Developer Meeting - 7/11

by Steve Ebersole

Today we discussed Integrators, mapping fetch strategies in the new metamodel and wakeboarding amongst other topics :) [11:35] <jbott> Meeting ended Thu Jul 11 16:07:16 2013 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [11:35] <jbott> Minutes: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2013/... [11:35] <jbott> Minutes (text): http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2013/... [11:35] <jbott> Log: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2013/...

11 years, 4 months

1
0
0 / 0

Re: [hibernate-dev] Deprecating configurability of "hibernate.search.worker.scope" ?

by Sanne Grinovero

On 11 July 2013 15:29, Hardy Ferentschik <hardy(a)hibernate.org> wrote: > I find them confusing as well and cannot thing of an actual use case. > I assume you are removing the hibernate.search.worker.* settings as well, right? Partly: we still need the options which apply to our "one and only" implementation, the TransactionalWorker To be clear, those defined in Table 3.3. Execution configuration should stay. Right? Sanne > > if so +1 > > --Hardy > > > On 11 Jan 2013, at 4:04 PM, Sanne Grinovero <sanne(a)hibernate.org> wrote: > >> I'm wondering if this property is really useful. Someone has a >> practical example in which he would need it? >> >> http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#t... >> >> I'm tempted to deprecate it and remove the description from the >> documentation as so far I've only seen people asking clarifications >> about it, to then conclude they don't need this (or more likely didn't >> understand it and decide to stay away from it). >> >> We could technically leave the loading code in place, it's just the >> documentation which is troubling me. >> >> Cheers, >> Sanne >> _______________________________________________ >> hibernate-dev mailing list >> hibernate-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/hibernate-dev >

11 years, 4 months

1
0
0 / 0

Deprecating configurability of "hibernate.search.worker.scope" ?

by Sanne Grinovero

I'm wondering if this property is really useful. Someone has a practical example in which he would need it? http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#t... I'm tempted to deprecate it and remove the description from the documentation as so far I've only seen people asking clarifications about it, to then conclude they don't need this (or more likely didn't understand it and decide to stay away from it). We could technically leave the loading code in place, it's just the documentation which is troubling me. Cheers, Sanne

11 years, 4 months

2
2
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

hibernate-dev July 2013