HSEARCH: Coexisting of Lucene and Elasticsearch backends vs polymorphism & co
by Sanne Grinovero
In the context of implementing Elasticsearch support for Hibernate
Search, there's a recurring need to transform the domain model to the
"Document" representation using a strategy which depends on the
storage choice, i.e. Lucene vs Elasticsearch.
For example Guillaume working on HSEARCH-2067 needs to associate the
entities document builder with a FieldBridge choice which needs to
know if the output document will be indexed in ES, rather than Lucene.
The choice of FieldBridge implementation affects the DocumentBuilder
bound to each type; this implies that we're "tainting" the
DocumentBuilder for all instance of a type.
The abstraction of "IndexManager" is meant to initialize and manage an
*index* - but remember that there's no guarantee that a single type is
bound to a single index (and so to a single IndexManager).
- We have the case of a single type being spread out on multiple
indexes, using Sharding.
- We also have the opposite, of multiple different types sharing and index
- Subtypes of indexed types can opt to be indexed in a different type
- All of two above can be mixed freely, as there's a clear
distinction between type (identified by a Class) and index (identified
by a String)
[I'm not stating that the above facts are necessarily all required,
just that they are currently supported.. so we could in theory discuss
taking away some of this flexibility now, but implementing such
restrictions would need to wait for version 6.0.]
When a Query is run on a type A, we're transparently running the query
on all indexes of shards containing A, and also its indexed subtypes
on different indexes. We're also filtering out incompatible types
transparently, if any of these sub-indexes are shared with other
types.
We also allow running a FullTextQuery on multiple, unrelated types and
the same rules apply.
To perform such a Query on multiple indexes, the trick currently used
with Lucene based backends is the usage of MultiReaders: we wrap
multiple indexes and present them as one index reader to the query
engine, it's a "unified view" on which the query is performed.
For obvious reasons we can not wrap a MultiReader across both Lucene
indexes and Elasticsearch's query capabilities (or maybe we could
eventually, but that's a whole lot of R&D to be done for questionable
usefulness).
So, we need to introduce a new concept: something like "index
families" to properly abstract the boundaries as clearly some indexes
can work together better within the same kind than with indexes of
other kind.
Stuff indexed in Lucene embedded would belong to a family A, stuff in
the Elasticsearch cluster would be family B, and I guess one might
have a secondary independent Elasticsearch cluster which would need to
be in a different family C, or eventually a Solr cluster in yet
another separated family.
Such an "index family" would give us:
- a place were the connection settings, connections pools are handled
for Elasticsearch
- clear boundaries about which types can be queried "as one": only
the types in the same family, and subtypes might be allowed a
different index but it must live in the same family. Same for
Sharding.
- a reasonable place to query for which "kind of storage" is being
used for a specific type
- An Analyzer might exist only within a family (Defined on one ES
cluster, not on the other)
- We have a long standing issue with Similarity: you can only have
one in a group of indexes, but the group concept is undefined (and
only loosely validatable)
- And "index family" could have a type, therefore define what kind of
FieldBridge(s) need to be generated
I'm not saying that this is all blocking for 5.6. My proposal is to
see if we agree on such a design as a longer term objective (set some
foundation in 5.7, finalize for 6).
For 5.6 I'd be happy enough to essentially document that there's only
one family allowed, which allows us to cut some corners like:
- single set of Analyzers to validate
- know that the Search instance is fully using ES exclusively, or
Lucene exclusively
- know that all IndexManagers are connected to the same set of ES
nodes (if using ES)
So not much changing.. just hope this helps in shaping our internals
with an eye on the next step, and make sure that the listed
limitations which we've been accepting already can be clearly
documented.
It would be great to already have the basics for index families in
place, for example to define the proper API to read metadata for a
type (like Guillaume is needing), and to cleanup some things, such as
make the Similarity definition clearly associated to such a thing.
Naming: index family ? index groups?
Not sure if there's need to add anything to the configuration
properties; for now it could simply reflect our interpretation of the
existing configuration, yet expose useful and clean metadata to the
internal components which need this.
Thanks for any comments!
Sanne
9 years, 10 months
Regarding new Dialects
by Vlad Mihalcea
Hi,
We have another Dialect coming as a pull request:
https://github.com/hibernate/hibernate-orm/pull/1330
In my opinion, we should integrate new Dialects and just document which
ones are maintained by us, and which ones are under development by other
parties.
For the Altibase database, the Dialect was sent by an Altibase employee, so
it is their interest to maintain it as well.
Vlad
9 years, 10 months
http://www.hibernatespatial.org/
by Steve Ebersole
Karel, a user was asking about problems using the hibernate-spatial mailing
list on IRC which precipitated a discussion about how we want deal with
these things moving forward. For example, the
http://www.hibernatespatial.org/ site is still up and running and really
has no indication that the move to integrate Spatial into Hibernate proper
was completed. What do you want to have happen with that website/URL?
As for other infrastructure, what would you like to have happen? It seems
like hibernate-spatial is a more user-focused mailing list, as opposed to a
dev mailing list? If so, Hibernate does not really do user mailing lists.
We prefer the forums or StackOverflow for user questions, so there is not a
straight "migration". You can obviously keep the hibernate-spatial mailing
list running too, but we should have some idea how to help users who are
having trouble with it on the website (which website depends on what you
decide to do with http://www.hibernatespatial.org/.
Any other things we should discuss in terms of infrastructure?
Davide, what was the exact problem the user on IRC was complaining about
wrt the hibernate-spatial mailing list?
9 years, 10 months
Re: [hibernate-dev] A warmup task: HSEARCH-2207
by Mincong Huang
Hi Sanne and Gunnar,
I've made my first pull request to Hibernate Search :
https://github.com/hibernate/hibernate-search/pull/1065
Cheers,
Mincong
On Sun, Apr 17, 2016 at 12:35 AM, Mincong Huang <mincong.h(a)gmail.com> wrote:
> Hi Sanne and Gunnar,
>
> I've read some docs and understand better about Apache Lucene and
> Hibernate Search now.
>
> Concerned about the HSEARCH-2207, I'm working on it. You can see my blog
> - Starting Hibernate Search
> <http://mincong-h.github.io/hibernate-search/2016/04/16/starting-hibernate...>
>
> for advances. I'll try to commit tomorrow or maybe Monday. This email is
> just keep you informed :)
>
> By the way, is there any way to boost maven install ? It takes me 7 min to
> run the project.
> I use the following command line (1 thread by processor)
>
> $ mvn clean install -s settings-example.xml -T 1C
>
> But "-T 1C" does not really help much.
>
> Good night,
> Mincong
>
> On Wed, Apr 13, 2016 at 11:36 PM, Mincong Huang <mincong.h(a)gmail.com>
> wrote:
>
>> Thanks for your encouragement, I'll keep working on it :)
>>
>> Mincong
>>
>> On Wed, Apr 13, 2016 at 9:01 AM, Gunnar Morling <gunnar(a)hibernate.org>
>> wrote:
>>
>>> Hey Mincong,
>>>
>>> Nice! Looking forward to your pull request for this issue very much.
>>>
>>> Thanks,
>>>
>>> --Gunnar
>>>
>>>
>>>
>>> 2016-04-13 1:32 GMT+02:00 Sanne Grinovero <sanne(a)hibernate.org>:
>>>
>>>> Hi Mincong,
>>>> great start!
>>>>
>>>> I've added you to the right groups on JIRA, so now HSEARCH-2207 is
>>>> assigned to you.
>>>>
>>>> To be allowed to send emails to the hibernate dev mailinst list you
>>>> have to register first:
>>>> - https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>
>>>> Thanks,
>>>> Sanne
>>>>
>>>>
>>>> On 12 April 2016 at 22:55, Mincong Huang <mincong.h(a)gmail.com> wrote:
>>>> > Hi Sanne ant Gunner,
>>>> >
>>>> > Concerned with the issue HSEARCH-2207, I'm studying on it.
>>>> >
>>>> > I created a demo and write a blog about boolean query in Apache
>>>> Lucene.
>>>> > As for Hibernate Search, I still not start learning it. I'll probably
>>>> start
>>>> > this weekend.
>>>> > For instant, my priority is to understand the index, query and
>>>> analyzer in
>>>> > Lucene.
>>>> >
>>>> > My JIRA account is mincong.h(a)gmail.com
>>>> >
>>>> > I'm not allowed to send email to hibernate dev list, so it isn't in
>>>> cc.
>>>> >
>>>> > Cheers,
>>>> > Mincong
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Sat, Apr 9, 2016 at 8:27 PM, Mincong Huang <mincong.h(a)gmail.com>
>>>> wrote:
>>>> >>
>>>> >> Hi Sanne,
>>>> >>
>>>> >> Thanks for your email. No, it won't be boring at all. On the
>>>> contrary,
>>>> >> this will be a great beginning !
>>>> >> I'll take a look on the issue and keep you informed. As for the JIRA
>>>> >> account, here's the mine :
>>>> >> mincong.h(a)gmail.com
>>>> >>
>>>> >> Cheers,
>>>> >> Mincong
>>>> >>
>>>> >>
>>>> >> On Fri, Apr 8, 2016 at 2:35 PM, Sanne Grinovero <sanne(a)hibernate.org
>>>> >
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi Mincong,
>>>> >>>
>>>> >>> Gunnar suggested that I find a first task for you to familiarize
>>>> with
>>>> >>> the project, and to have you practice the process of creating
>>>> patches
>>>> >>> and proposing them for integration over GitHub.
>>>> >>>
>>>> >>> I think this issue could be suited:
>>>> >>> - https://hibernate.atlassian.net/browse/HSEARCH-2207
>>>> >>>
>>>> >>> I hope it's not too boring! It's not just an exercise, we need that
>>>> >>> done so it would be a valuable contribution already.
>>>> >>>
>>>> >>> Do you have an account on our JIRA server? If not create one, then
>>>> let
>>>> >>> me know your user id so I can assign the issue to you.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Sanne
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>
9 years, 10 months
Master
by Steve Ebersole
Obviously consolidating hibernate-entitymanager into hibernate-core is a
fairly big effort. And I am getting concerned about the continuing pushes
to master in the meantime, many of which I know touch on code I have
changed. My concern is obviously getting done all this refactoring work
and then having to sift through all of the changes that have been pushed in
the interim and attempting to work out the proper integration strategy.
Long story short... I am contemplating pushing to master sooner rather than
later even though my refactoring may not be completely finished, especially
as we get towards the end of the week.
9 years, 10 months
GitHub PR management
by Vlad Mihalcea
Hi,
Both Andrea and I have been going through the current list of PR, and I
think it is a good idea to start labeling them.
Some issues are trivial and can be integrated as soon as we can commit on
the master.
I think we should add the following labels:
- Needs CLA
- Needs Jira
- Needs Test Case
- Envers
- 4.2
- 4.3
- 5.0
- 5.1
This way we can easily navigate through them.
Vlad
9 years, 10 months
Re: [hibernate-dev] A warmup task: HSEARCH-2207
by Sanne Grinovero
Hi Mincong,
great start!
I've added you to the right groups on JIRA, so now HSEARCH-2207 is
assigned to you.
To be allowed to send emails to the hibernate dev mailinst list you
have to register first:
- https://lists.jboss.org/mailman/listinfo/hibernate-dev
Thanks,
Sanne
On 12 April 2016 at 22:55, Mincong Huang <mincong.h(a)gmail.com> wrote:
> Hi Sanne ant Gunner,
>
> Concerned with the issue HSEARCH-2207, I'm studying on it.
>
> I created a demo and write a blog about boolean query in Apache Lucene.
> As for Hibernate Search, I still not start learning it. I'll probably start
> this weekend.
> For instant, my priority is to understand the index, query and analyzer in
> Lucene.
>
> My JIRA account is mincong.h(a)gmail.com
>
> I'm not allowed to send email to hibernate dev list, so it isn't in cc.
>
> Cheers,
> Mincong
>
>
>
>
> On Sat, Apr 9, 2016 at 8:27 PM, Mincong Huang <mincong.h(a)gmail.com> wrote:
>>
>> Hi Sanne,
>>
>> Thanks for your email. No, it won't be boring at all. On the contrary,
>> this will be a great beginning !
>> I'll take a look on the issue and keep you informed. As for the JIRA
>> account, here's the mine :
>> mincong.h(a)gmail.com
>>
>> Cheers,
>> Mincong
>>
>>
>> On Fri, Apr 8, 2016 at 2:35 PM, Sanne Grinovero <sanne(a)hibernate.org>
>> wrote:
>>>
>>> Hi Mincong,
>>>
>>> Gunnar suggested that I find a first task for you to familiarize with
>>> the project, and to have you practice the process of creating patches
>>> and proposing them for integration over GitHub.
>>>
>>> I think this issue could be suited:
>>> - https://hibernate.atlassian.net/browse/HSEARCH-2207
>>>
>>> I hope it's not too boring! It's not just an exercise, we need that
>>> done so it would be a valuable contribution already.
>>>
>>> Do you have an account on our JIRA server? If not create one, then let
>>> me know your user id so I can assign the issue to you.
>>>
>>> Thanks,
>>> Sanne
>>
>>
>
9 years, 10 months