March 2014 - hibernate-dev - Jboss List Archives

by Steve Ebersole

[11:50] <jbott> Minutes: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2014/... [11:50] <jbott> Minutes (text): http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2014/... [11:50] <jbott> Log: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2014/...

10 years, 9 months

1
0
0 / 0

[Search] The case against searching with Criteria + restrictions

by Guillaume Smet

Hi, = Context = So, my patch here [1] broke a test which checks that Criteria + restrictions mostly work - even if it's documented as not supported and not working. "Mostly" as in "you can't get the result size but you might get the results". See [2] for explanations. I spent some time yesterday contemplating this issue and, while I'm sorry for breaking this test, I still think we should apply my patch, remove this test and make this case not supported for good. = Why it mostly works = In the original ObjectLoaderHelper implementation, we use session.load: it doesn't force the proxy to be initialized. If a proxy for an entity isn't initialized, it's filtered out from the results. It's the job of the various implementations of ObjectsInitializer to initialize the objects in the session so that they are later included in the results. In the case of Criteria + restrictions, the restrictions are applied in the ObjectsInitializer so the entity which doesn't satisfy the restrictions are not initialized in the ObjectsInitializer... and thus not included in the results. = Why my patch is breaking this behaviour consistently = In my patch, I use Session.get which forces the initialization of the proxy and I removed the filter removing the uninitialized proxies: it became unnecessary as I was sure all proxies are now initialized. This patch has been designed to solve HSEARCH-1448 and to simplify the ObjectLoaderHelper code which was quite complicated. Situation after my patch: all the results satisfying the full text search are returned. The restrictions of the criteria are not taken into account. In fact, it works as documented. = Relying on the session state to filter out entities is wrong = So the fact is that we basically rely on the session state to filter out the results we don't want. I had to check that my gut feeling was right so I checked out current master, opened ResultSizeOnCriteriaTest and just added the following lines before the search call: //load in the session the object which shouldn't be returned just for fun session.get( Tractor.class, 2 ); -> the object is returned and the test is broken too. This is expected behaviour as this object has been initialized in the session and is now considered as a valid candidate for the results. = Conclusion = I don't think we can have a really working criteria + restrictions thing without refactoring a lot of things in search: the Initializer + Loader concept can't work reliably in this case. Therefore I think we should simply remove this test and clearly make it fail as it can be a potential security flaw if we return entities the user shouldn't see just because they were initialized in the session for another purpose. We might revisit it later but I really think it's a lot of work to get it right. Thoughts? References [1] https://github.com/hibernate/hibernate-search/pull/581 [2] https://hibernate.atlassian.net/browse/HSEARCH-753

10 years, 9 months

5
5
0 / 0

Search: changing the way we search

by Guillaume Smet

Hi, So, it's been a long time since I threw the first idea of this (see HSEARCH-917) but, after a lot more thoughts, and the fact that I'm basically stuck for a long time on this one, it's probably better to agree with a plan before putting together some code. Note that this plan is based on our usage of Hibernate Search on a lot of applications for several years and I think our usage pattern is quite common. But, even so, I'm pretty sure there are other search patterns out there which might be interesting and it would be nice to include them in this proposal if they don't fit. I. How do we search at my company? ------------------------------------------------------- We mainly use Search for 2 things: - autocompletion; - search engines: search form to filter a list of items. Usually, a plain text field and several structured fields (drop down choice mostly). We usually sort with business rules, not using score. Users usually like it better as it's more predictable. For example, we sort our autocompletion results alphabetically. An interesting note here is probably that we work on structured data, not on CMS content. This might be considered a detail but you'll see it's important. We use analyzers to: - split the words (the WordDelimiterFilter - yeah, I have a Solr background :)); - filter the input (AsciiFoldingFilter, LowercaseFilter...); - eventually do simple stemming (with our own very minimal stemmers). We sometimes use Search to find the elements to apply business rules when it's really hard to use the database to do so. Search provides a convenient way to denormalize the data. II. On why we can't use the DSL out of the box -------------------------------------------------------------------- The Hibernate Search DSL is great and I must admit this is the DSL which learned me how to build DSL for our own usage. It's intuitive, well thought, definitely a nice piece of code. So, why don't we use it for our plain text queries? (Disclaimer: we use it under the hood, we just have to do a lot of things manually outside of the DSL) Several reasons: 1/ the aforementioned detail about sorting: we need AND badly in plain text search; 2/ we often need to add a clause only if the text isn't empty or the object not null and we then need to add more logic than the fluent approach allows it (I don't have any ideas/proposals for this one but I think it's worth mentioning). And why is it not ideal: 3/ wildcard and analyzers are really a pain with Lucene and you need to implement your own cleaning stuff to get a working wildcard query. 1/ is definitely our biggest problem. III. So let's add an AND option... ----------------------------------------------- Yeah, well, that's not so easy. Let's take a look at the code, especially our dear friend ConnectedMultiFieldsTermQueryBuilder . When I started to look at HSEARCH-917, I thought it would be quite easy to build lucene queries using a Lucene QueryParser instead of all the machinery in ConnectedMultiFieldsTermQueryBuilder. It's not. Here are pointers to the main problems I have: 1/ the getAllTermsFromText is cute when you want to OR the terms but really bad when you need AND, especially when you use analyzers which returns several tokens for a term (this is the case when you use the SynonymFilter or the WordDelimiterFilter); 2/ the fieldBridge thing is quite painful for plain text search as we are not sure that all the fields have the same fieldBridge and, thus, the search terms might be different for each fields after applying the fieldBridge. These problems are not so easy to solve in an absolute kind of way. That's why I haven't made any progress on this problem. Let's illustrate the problem: - you search for "several words in my content" (without ", it's not a phrase query, just terms) - you search in the fields title, summary and content so you expect to find at least one occurrence of each term in one of these fields; - for some reason, you have a different fieldBridge on one of the fields and it's quite hard to define "at least one occurrence of each term in one of these fields" as the fieldBridge might transform the text. My point is that I don't see a way to fix the current DSL without breaking some cases (note that the current code only works because only the OR operator is supported) even if we might consider they are weird. >From my perspective, a plainText branch of the DSL could ignore the fieldBridge machinery but I'm not sure it's a good idea. That's why I would like some feedback about this before moving in this direction. I took a look at the new features of Lucene 4.7 and the new SimpleQueryParser looks kinda interesting as it's really simple and could be a good starting point to come up with a QueryParser which simply does the job for our plain text search queries. IV. About wildcard queries -------------------------------------- Let's say it frankly: wildcard queries are a pain in Lucene. Let's take an example: - You index "Parking" and you have a LowerCaseFilter so your index contains "parking"; - You search for Parking without wildcard, it will work; - You search for Parki* with wildcard, yeah, it won't work. This is due to the fact that for wildcards, the analyzers are ignored. Usually, because if you use ? or *, they can be replaced by the filters you use in your analyzers. While we all understand the Lucene point of view from a technical perspective, I don't think we can keep this position for Hibernate Search as a user friendly search framework on top of Hibernate. At Open Wide, we have a quite complex method which rewrites a search as a working autocompletion search which might work most of the time (with a high value of most...). It's kinda ugly, far from perfect and I'm wondering if we could have something more clever in Search. I once talked with Emmanuel about having different analyzers for Indexing, Querying (this is the Solr way) and Wildcards/Fuzzy search (this is IMHO a good idea as the way you want to normalize your wildcard query highly depends on the analyzer used to index your data). V. The "don't add this clause if null/empty" problem ---------------------------------------------------------------------------- Ideas welcome! VI. Provide not so bad default analyzers --------------------------------------------------------- I think it would be nice to provide default analyzers for plain text. Not necessarily ones including complex/debatable things like stemmers, but at least something which gives a good taste of Search before going into more details. Why would it be interesting? As a French speaking person, I see so much search engines out there which don't normalize accented characters, it would be nice to have something working by default. VII. Conclusion ---------------------- I would really like to make some quick progress on III. I'm pretty sure, we're not the only ones having a lot of MultiFieldQueryParser instantiations in our Search code to deal with this. And I don't talk about the numerous times when one of our developers used the DSL without even thinking it would use the OR operator. Comments welcome. -- Guillaume

10 years, 9 months

4
17
0 / 0

jsr107

by Alex Snaps

Hey everyone, I wondered if anyone had considered (even the feasibility of) moving the Caching SPI of Hibernate to use the (now released!) jcache API of JSR107? I was contemplating having a look at providing a "jsr107 caching provider" maybe first, which then could maybe folded into Hibernate... Anyways, random thoughts, maybe some of you already have insights. Also, I'd expect that (some) "cache vendors" might want to do some tuning based on the Hibernate usecase still, so maybe the idea isn't such a great one (if even feasible again, as I didn't even look into that). Anyways... further random thoughts on the subject welcome, even non-random ones actually. Alex -- Alex Snaps <alex.snaps(a)gmail.com> Principal Software Engineer - Terracotta http://twitter.com/alexsnaps http://www.linkedin.com/in/alexsnaps http://withinthefra.me

10 years, 9 months

3
3
0 / 0

Re: [hibernate-dev] Lucene moving to Java7

by Sanne Grinovero

Note this would affect only our upcoming Hibernate Search 5.0: it's a major release which breaks some backwards compatibility anyway. I guess that blasts any remaining concern? For the purpose of WFK users in maintenance mode I'll expect them to stay on previous Search version, on which we'll backport fixes as usual. But I also expect we'll eventually want to provide a "new" version to deliver the goodies of EE7, JPA 2.1, etc.. which all require JDK7 anyway (in the scope of WFK or anything else coming our way). Thanks all for the comments Sanne On 20 March 2014 16:04, Burr Sutter <bsutter(a)redhat.com> wrote: > Adding the WFK Mareks :-) > > The only potential problem that I see is backward incompatibility with WFK 2.0.0 and its supported frameworks through June 2015. > We do not require JVM upgrades, in production, for customers, within the "supported time window" - in our WFK case June 2012 to June 2015. > > > On Mar 20, 2014, at 11:21 AM, Sanne Grinovero <sanne(a)hibernate.org> wrote: > >> The next minor release of Apache Lucene v. 4.8 will require Java7. >> >> The Lucene team has highlighted many good reasons for that, including >> some excellent improvements in sorting performance and reliability of >> IO operations: nice things we'd like to take advantage of. >> >> Objections against baseling Hibernate Search 5 to *require* Java7 too? >> We hardly have a choice, so objections better be good ;-) >> >> -- Sanne >

10 years, 9 months

1
0
0 / 0

Make HibernatePersistenceProvider easier to extend

by Guillaume Smet

Hi, I'm starting our migration to ORM 4.3 to be able to provide some (hopefully useful) feedback on ORM 4.3 and Search 4.5. One thing we did in most of our apps was injecting Spring managed interceptor into the session factory using a trick very similar to what is explained there: http://blog.krecan.net/2009/01/24/spring-managed-hibernate-interceptor-in.... This (kinda ugly) trick doesn't work any more in 4.3 and I ended up doing the following: https://gist.github.com/gsmet/8578138 which works but is IMHO very fragile as I only changed one initialization method and if I would have liked to change them all, I would have to duplicate a lot of code. Would it be possible to create a protected method which centralizes the call to Bootstrap.getEntityManagerFactoryBuilder( persistenceUnit, integration, providedClassLoader ); call we could override easily? See: https://github.com/hibernate/hibernate-orm/blob/4.3/hibernate-entitymanag... https://github.com/hibernate/hibernate-orm/blob/4.3/hibernate-entitymanag... https://github.com/hibernate/hibernate-orm/blob/4.3/hibernate-entitymanag... This way I could simply override this method to return a different EntityManagerFactoryBuilder aware of my interceptor for every way to initialize an EntityManagerFactory. And minor nitpicking, I think the wrap method should be protected as it's used in a protected method we might want to tweak without having to implement this method again. see https://github.com/hibernate/hibernate-orm/blob/4.3/hibernate-entitymanag... I can open a Jira issue and do the work if you agree there is something to fix and we come up with a plan (and a clever method name). Thanks for your feedback. -- Guillaume

10 years, 9 months

3
3
0 / 0

Re: [hibernate-dev] ci.hibernate.org and network port assignment

by Paolo Antinori

hi everyone, I'll be happy to help with the activity of isolating build job in docker containers started directly via jenkins. The technology should allow concurrent build job totally isolated, as anticipated. I am going to start with OGM that is the project I am more familiar with and I will let you know of eventual achievement. paolo I've created WEBSITE-178 [1] as once again we had testsuites failing because of a network port being used by a different job; bad luck, but we can do better. Assuming we want to use the Jenkins plugin "port allocator", this would however need to be consistently honored by all builds we launch: it passes variables which we should use, but it can't enforce them AFAIK. Is that going to be maintanable in the long run? An alternative would be to consistently run each and every build in its own isolated docker container, but a) that's not something I can setup overnight b) we'd need to use more complex build scripts, for example the nice Maven and Gradle integrations with the Jenkins build configuration would be unusable. We're having quite a list of services; even ignoring the OGM exotic databases we have at least: - Byteman - JGroups - Arquillian - WildFly - ActiveMQ (JMS tests in Search) - ... ? To fight the urgent need, I'm going to prevent parallelism of any build. Let's hope ORM's master doesn't get stuck, as everything else would be blocked. I really hope this measure stays as temporary as possible :-/ -- Sanne

10 years, 9 months

3
3
0 / 0

DefaultLoadEventListener and second level cache

by Guillaume Smet

Hi, We have a lot of second level cache misses on one of our applications and I wanted to understand why it was the case. These cache misses happen even after loading twice the exact same page. They are coming from entities which are loaded via DefaultLoadEventListener. I tried to debug it and was looking for the place where the entity is put in the cache when the DefaultLoadEventListener path is used. Could someone point me to where we put the entity in the cache so that I can try debugging further? Thanks in advance. -- Guillaume

10 years, 9 months

2
1
0 / 0

[OGM] Discussion on Neo4J mapping

by Emmanuel Bernard

Davide explained to me the mapping of entities and associations by the new Neo4J dialect. It is recorded here. Meeting ended Wed Mar 19 12:10:29 2014 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) Minutes: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2014/... Minutes (text): http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2014/... Log: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2014/...

10 years, 9 months

1
0
0 / 0

IRC Developer meeting - 3/18

by Steve Ebersole

Discussed forums and metamodel (5.0) development [12:05] <jbott> Minutes: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2014/... [12:05] <jbott> Minutes (text): http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2014/... [12:05] <jbott> Log: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2014/...

10 years, 9 months

1
0
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

hibernate-dev March 2014