April 2016 - hibernate-dev - Jboss List Archives

"Service" in Hibernate Search: history, lessons learned and rewrite

by Sanne Grinovero

The "Service" and "ServiceManager" concepts in Hibernate Search have a specific meaning which is often misunderstood and/or abused, causing trouble. They also changed over time: victim of two major refactorings which evolved the purpose and stretching its intent So I'll change the definition again :-) But hopefully clarifying it, so here is a draft of the rules which I plan to both implement and document carefully on the javadoc, with some comments to highlight what is changing: # A service type is identified by a Class: a Service interface Nothing new here. Yet: worth stressing that this implies that only one implementation will be around. # There can only be ONE IMPLEMENTATION of a service type used by the whole Hibernate Search instance In other words: it was not a good idea to have the LuceneWorkSerializer to be modelled as a Service, back when we supposedly could use a different serialization strategy for a different index. Yet it is a good idea nowadays to have LuceneWorkSerializer extend Service, as we dropped that level of flexibility. This implies that there's a single type of serializer (at most) and it's totally fine to expose this as: SearchIntegrator#getLuceneWorkSerializer() [this method doesn't exist yet, but I'm thinking of adding it for our convenience and the following other points] P.S. we're only maintaining - and bundling - a single Serializer implementation so it's no surprise that we can handle only one.. yet this implies people wanting to override it have to either hack our bootstrap or physically remove our implementation. # A Service implementation can be provided by having it injected at bootstrap (i.e. org.hibernate.search.cfg.spi.SearchConfiguration.getProvidedServices() ) Not a new rule either: repeating for clarity. We call these "provided" services. # If a service isn't "provided", then we attempt to create one using java.util.ServiceLoader Currently this expects a single implementation to be available: there's no way to pick which one if there are multiple implementations on the classpath. I think we'll need to be able to pass a "hint" or similar to the requestService to allow expression of preferences, handle shortnames, etc.. a proposal for that will follow when there will be need: at this point it's important to clarify the limitation, as this expresses what a Service is not able to model today. Currently implementations are looked up "on demand". I plan to allow "pre-initialize" services as it removes some trouble; these components could have convenience getters, not least to remove the concurrency overhead. Remember that since there's only one implementation for a given type around, there's no reason to not do this: the intent of the Service contract is to allow people to inject a customized implementation. # If a Service implementation also implements Startable, or Stoppable, we'll invoke the respective methods once at start and/or at stop of the Search instance - unless they are provided in which case they are ignored. The current javadoc suggests that it's illegal for a provided implementation to also implement Startable and/or Stoppable; I don't remember why that was, but today it seems unfitting: people might want to extend one of our implementations, or reuse some of the implementations normally auto-started but reuse them "by instance" by providing them to multiple Search instances to save memory (we actually have a need for this for Index Affinity in Infinispan). The important concept which will survive, is that we don't start or stop stuff which is provided as that's clearly responsibility of another component. # All non-provided Services will be stopped once, and only once as final step when the SearchIntegrator is stopped. This is a significant difference with today's code: we expect the Service consumers to "open / close", hopefully in a finally block, to the point that Gunnar enhanced it to at least allow AutoClose semantics. Yet, I don't want runtime code to open and close these frequently as it has been a bottleneck in the past. It also led to the creations of issues like HSEARCH-1589 : we might start/stop the same service frequently, and need to improve with reference counters. I suspect that historically the reasoning was to make sure that the order of teardown would follow the inverse order of bootstrap as components would cleanup after themselves, but having clarified that Service instances are unique globally, should also imply that their state doesn't depend on other Services. So the teardown order doesn't matter anymore.. we'll start one for each, but only close it at shutdown. # Hierarchy? We've talked about global components so far. It's clear that the IndexManager has a central role in the overall architecture, as we tend to allow per-index customisations. Or per-family customisations, as suggested in my previous email. An example which affects Service: The "JestClient client" [the Service we use to connect to Elasticsearch] could be considered a good fit for being a "Service" as this allows people to override the client implementation and/or inject a pre-configured instance.. yet it's not a good fit if we want to allow people to connect to different hosts for different indexes. I don't plan to implement the hierarchical ServiceManager right now, but proposing it already so that we can agree on the above cleanups in contract, with the perspective that there are cleaner solutions also for the scoped use case. Implementing these changes resolves or obsoletes at least 10 JIRA issues in one shot.. Thanks, Sanne

8 years, 8 months

4
6
0 / 0

Re: [hibernate-dev] GSoC 2016: Congratulations, your proposal with JBoss Community has been accepted!

by Mincong Huang

Hi everybody, Thanks for accepting my application of GSoC !! Really excited to having chance to work the hibernate team. I'm so happy to see this email. It's just like a dream, can't believe it is true !! Thanks for choosing me. I'll try my best to accomplish this mission !! Happy coding and good night :-) Cheers, Mincong On Fri, Apr 22, 2016 at 9:25 PM, Google Summer of Code < summerofcode-noreply(a)google.com> wrote: > [image: Google Summer of Code] > > Hi mincongh, > > Welcome to GSoC 2016! > > Your proposal Hibernate Search: JSR 352 batch job for re-indexing entities > <https://summerofcode.withgoogle.com/dashboard/student/proposal/5244068401...> > has been accepted! > > We look forward to seeing the great things you will accomplish this summer > with JBoss Community. > > This email contains important information about your participation in GSoC > this year. Please read it carefully. > > Over the next month you will take part in the Community Bonding period > with your organization. This period is for you to become familiar with the > organization's code base, version control and other infrastructure. You > will be getting to know the community and its practices, as well as working > with your mentor on milestones for the summer. > > Complete all of these steps as soon as you can: > > 1. Read the Accepted Student Information > <https://developers.google.com/open-source/gsoc/help/accepted-students> > 2. Upload <https://summerofcode.withgoogle.com/dashboard/> your tax > form *before May 16, 2016 at 19:00 UTC* > 3. Read the Student Payment Information > <https://developers.google.com/open-source/gsoc/help/payoneer> > 4. Set up your Payoneer account before May 16, 2016 at 19:00 UTC > 5. Verify your shipping address, promotional materials, and t-shirt > information on your profile > <https://summerofcode.withgoogle.com/dashboard/profile/>. > > If you have questions about anything in this email, please email the > Google GSoC support team at gsoc-support(a)google.com. Don’t email the > student list with tax or payment issues. > > Have a great summer! > > Google Open Source Programs team > > This email was sent to mincong.h(a)gmail.com. > > You are receiving this email because of your participation in Google > Summer of Code 2016. > https://summerofcode.withgoogle.com > > To leave the program and stop receiving all emails, you can go to your > profile <https://summerofcode.withgoogle.com/dashboard/profile/> and > request deletion of your program profile. > > For any questions, please contact gsoc-support(a)google.com. Replies to > this message go to an unmonitored mailbox. > > © 2016 Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, > USA >

8 years, 8 months

9
9
0 / 0

NoORM IRC meeting transcripts

by Guillaume Smet

Hi everyone, Here are the transcripts of this week's NoORM IRC meeting. 15:47 < jbott> Minutes: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2016/... 15:47 < jbott> Minutes (text): http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2016/... 15:47 < jbott> Log: http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2016/... First meeting as chair so there are a couple of adjustments to make. See you next time! -- Guillaume

8 years, 8 months

1
0
0 / 0

Released: Hibernate Search 5.5.3.Final

by Sanne Grinovero

Hello all, a maintenance release of our latest stable branch, version 5.5.3.Final of Hibernate Search is now available, and is the suggested stable version for everyone to use. More details about the improvements can be found on our blog: - http://in.relation.to/2016/04/26/Polishing-A-Great-Release-Hibernate-Sear... Regards, Sanne

8 years, 8 months

1
0
0 / 0

[HSEARCH] Usage of ShardIdentifierProvider

by Gunnar Morling

Hi, As IndexShardingStrategy is deprecated, I thought I'd use ShardIdentifierProvider and friends in new code I write. It's not clear to me though, how it's meant to be used. Some questions: * Is it correct that EntityIndexBinding#getShardIdentifierProvider() returns null is sharding is not used for this entity? I suppose in that case I simply can use EntityIndexBinding.getIndexManagers()[0]? * What's the envisioned way to get the IM for a given shard once I know the shard id? I found IndexManagerHolder.getOrCreateIndexManager(), but this expects a *DynamicSharding*EntityIndexBinding, so how would it work for non-dynamic sharding? It's tough to see how the pieces are meant to fit together, now that IndexShardingStrategy and ShardIdentifierProvider are there. I hope we can get rid of the former soon, simplifying the code a bit? Thanks, --Gunnar

8 years, 8 months

2
3
0 / 0

HipChat history is limited

by Gunnar Morling

Hey all, I was looking for a discussion I had with Emmanuel a few month ago on HipChat. But navigating back in time, I could not go before Feb 1st because I hit "You've reached the end of your viewable chat history. Switch to HipChat Plus for unlimited access". Does anyone know whether we can get free HipChat Plus licenses as an OSS project? If not, I personally see no other way than going back to IRC completely. Not sure whether that only affects 1:1 chats (I could go back farther in the history of project rooms), but having access to only less than three months of history is a deal breaker for me. Thanks, --Gunnar

8 years, 8 months

1
0
0 / 0

Hibernate.org and Roadmap nav link

by Steve Ebersole

I have started maintaining[1] the ORM Roadmap external to hibernate.org itself. I'd like to adjust the link to when under orm/ to point to this external URL rather than the parameterized {project}/roadmap target. Is that possible? And if so, how? [1] https://github.com/hibernate/hibernate-orm/wiki/Roadmap

8 years, 8 months

4
8
0 / 0

[HSEARCH] Scope of the first version with ES support

by Gunnar Morling

Hey, I'd like to achieve clarity and agreement on the scope of HSEARCH 5.6, the first release with support for the Elasticsearch indexing backend. I suggest we limit ourselves to the essential things making the backend actually usable and release it as a "technology preview" as of 5.6.0.Final. Everything not needed for that goal I'd move to subsequent releases (5.7, 6.0), the motivation being that we should not kill the vibe and deliver something real soon. Some candidates for moving over I see: * "Define analyzers via the REST API (HSEARCH-2219 <https://hibernate.atlassian.net/browse/HSEARCH-2219>)": Users can create the needed analyzers themselves * "Consider using the fields feature of Elasticsearch for properties mapped on several fields" (HSEARCH-2215 <https://hibernate.atlassian.net/browse/HSEARCH-2215>): Seems scheduled as a "reminder" only anways? * "Use the Elasticsearch Scroll API when fetching large result sets" ( HSEARCH-2128 <https://hibernate.atlassian.net/browse/HSEARCH-2128>): Seems not strictly needed * "Map the optimize() operation to Elasticsearch 'force merge' requests" ( HSEARCH-2092 <https://hibernate.atlassian.net/browse/HSEARCH-2092>): Manual requests possible as a work-around * Likely some others Things we *should* do are most mapping-related issues, documentation and apparent perf issues (massing indexing, avoid too frequent refreshing). The public interest in the subject seemed good, so I'd prefer if we can ship a "Final" version soon in MVP-style. As it seems, a "final" tech preview is less scary to people than an Alpha/Beta. Let's hone the bits it in subsequent releases, rather than working on the first Final for a long time. Any thoughts? --Gunnar

8 years, 8 months

2
2
0 / 0

HSEARCH: Coexisting of Lucene and Elasticsearch backends vs polymorphism & co

by Sanne Grinovero

In the context of implementing Elasticsearch support for Hibernate Search, there's a recurring need to transform the domain model to the "Document" representation using a strategy which depends on the storage choice, i.e. Lucene vs Elasticsearch. For example Guillaume working on HSEARCH-2067 needs to associate the entities document builder with a FieldBridge choice which needs to know if the output document will be indexed in ES, rather than Lucene. The choice of FieldBridge implementation affects the DocumentBuilder bound to each type; this implies that we're "tainting" the DocumentBuilder for all instance of a type. The abstraction of "IndexManager" is meant to initialize and manage an *index* - but remember that there's no guarantee that a single type is bound to a single index (and so to a single IndexManager). - We have the case of a single type being spread out on multiple indexes, using Sharding. - We also have the opposite, of multiple different types sharing and index - Subtypes of indexed types can opt to be indexed in a different type - All of two above can be mixed freely, as there's a clear distinction between type (identified by a Class) and index (identified by a String) [I'm not stating that the above facts are necessarily all required, just that they are currently supported.. so we could in theory discuss taking away some of this flexibility now, but implementing such restrictions would need to wait for version 6.0.] When a Query is run on a type A, we're transparently running the query on all indexes of shards containing A, and also its indexed subtypes on different indexes. We're also filtering out incompatible types transparently, if any of these sub-indexes are shared with other types. We also allow running a FullTextQuery on multiple, unrelated types and the same rules apply. To perform such a Query on multiple indexes, the trick currently used with Lucene based backends is the usage of MultiReaders: we wrap multiple indexes and present them as one index reader to the query engine, it's a "unified view" on which the query is performed. For obvious reasons we can not wrap a MultiReader across both Lucene indexes and Elasticsearch's query capabilities (or maybe we could eventually, but that's a whole lot of R&D to be done for questionable usefulness). So, we need to introduce a new concept: something like "index families" to properly abstract the boundaries as clearly some indexes can work together better within the same kind than with indexes of other kind. Stuff indexed in Lucene embedded would belong to a family A, stuff in the Elasticsearch cluster would be family B, and I guess one might have a secondary independent Elasticsearch cluster which would need to be in a different family C, or eventually a Solr cluster in yet another separated family. Such an "index family" would give us: - a place were the connection settings, connections pools are handled for Elasticsearch - clear boundaries about which types can be queried "as one": only the types in the same family, and subtypes might be allowed a different index but it must live in the same family. Same for Sharding. - a reasonable place to query for which "kind of storage" is being used for a specific type - An Analyzer might exist only within a family (Defined on one ES cluster, not on the other) - We have a long standing issue with Similarity: you can only have one in a group of indexes, but the group concept is undefined (and only loosely validatable) - And "index family" could have a type, therefore define what kind of FieldBridge(s) need to be generated I'm not saying that this is all blocking for 5.6. My proposal is to see if we agree on such a design as a longer term objective (set some foundation in 5.7, finalize for 6). For 5.6 I'd be happy enough to essentially document that there's only one family allowed, which allows us to cut some corners like: - single set of Analyzers to validate - know that the Search instance is fully using ES exclusively, or Lucene exclusively - know that all IndexManagers are connected to the same set of ES nodes (if using ES) So not much changing.. just hope this helps in shaping our internals with an eye on the next step, and make sure that the listed limitations which we've been accepting already can be clearly documented. It would be great to already have the basics for index families in place, for example to define the proper API to read metadata for a type (like Guillaume is needing), and to cleanup some things, such as make the Similarity definition clearly associated to such a thing. Naming: index family ? index groups? Not sure if there's need to add anything to the configuration properties; for now it could simply reflect our interpretation of the existing configuration, yet expose useful and clean metadata to the internal components which need this. Thanks for any comments! Sanne

8 years, 8 months

2
2
0 / 0

HipChat integrations

by Davide D'Alto

Hi, I've enabled the preview of the JIRA issue when there is a JIRA ID in the chat, let me know if it causes some problems. There are also some other addons that might be helpful and I would like to add if there are no objections: Standup: https://hibernate.hipchat.com/addons/hc-standup?room=-1 Google hangout: https://hibernate.hipchat.com/addons/com.atlassian.hipchat.hangouts?room=... Guest history: https://hibernate.hipchat.com/addons/hipchat-guest-history?room=-1 Pingmonit (might be useful to monitor hibernate.org, in relation.to, and ci.hibernate.org): https://hibernate.hipchat.com/addons/com.pingmonit.addons.hipchat?room=22... If you guys disagree or have previous experience with these addons, let me know. I haven't tested them yet. Cheers, Davide

8 years, 8 months

2
2
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

hibernate-dev April 2016