[hibernate-dev] [Hibernate Search] Database back end worker

Tue Aug 11 10:39:27 EDT 2015

+1 for the idea of the database-based backend. Seems very useful.

One idea for improvement may be to group on (entity-type,id) and only
select the latest LuceneDatabaseWork per id. That way you'd avoid
propagation of potentially outdated index updates.

2015-08-09 17:51 GMT+02:00 Sanne Grinovero <sanne at hibernate.org>:
> Hi,
> yes creating issues on JIRA is what we normally do, feel free to create them!
>
> They don't have to be sub-stasks; we use subtasks when there are
> several "blocker" steps which need to be accomplished to get a feature
> and it's so large that it needs to be split.
> In this case I'd make a single JIRA for the new feature - which gets
> resolved when we'll merge it in its most essential form - and further
> improvement ideas can be created now or as needed as independent
> improvements.
>
> Thanks!
> Sanne
>
> On 9 August 2015 at 06:56, Flemming Harms <flemming.harms at gmail.com> wrote:
>>
>>
>> 2015-08-05 23:38 GMT+02:00 Sanne Grinovero <sanne at hibernate.org>:
>>>
>>> Hi Flemming,
>>> welcome on this list!
>>>
>>> I waited a bit to reply myself, as you already know I like the
>>> proposal. Unfortunately many others are on holidays, so other feedback
>>> might be slow.
>>>
>>> Still I wouldn't let that slow you down and start the works for
>>> merging it; I already anticipated over chat that this would come and
>>> we all agree that the concept is useful!
>>> I don't think others looked at the details yet, but if it comes to
>>> concerns at that level, we can address smaller issues incrementally.
>>> (I also didn't look at micro-details, as it's easier to comment on
>>> those on a pull request).
>>>
>>> I had the same question as Martin regarding clustering: with the
>>> current implementation you expect something like the master/slave
>>> configuration, or Infinispan to be used as storage, correct?
>>> I also think it would be interesting to explore the approach further
>>> to also - optionally - serve as a replacement for these, but that's
>>> another feature which is easier to experiment with after the core
>>> concept is merged.
>>
>>
>>>
>>> Yes that's correct. To start with it was written very specific to the use
>>> of Infinispan as directory.
>>
>> And I agree on we should explore other cluster configuration and I have some
>> ideas how we can implement it.
>>>
>>> .
>>>
>>>
>>> In short, I would simply merge your backend as a new module in
>>> Hibernate Search! Fork our repository, and send a pull request.
>>>
>>> # Code layout / Modules
>>>
>>> In terms of code structure, you might have noticed that the module
>>> 'hibernate-search-engine' (/engine in the source code) does not depend
>>> on JPA nor Hibernate ORM; the reason is that other projects reuse the
>>> core indexing strategy and the backends. Since it would be nice to
>>> allow them to optionally use your backend, still not mandate a
>>> dependency on ORM for those who don't, I think this should be a new
>>> Maven module.
>>>
>>> We already have
>>>  /backends/jgroups
>>>  /backends/jms
>>>
>>> So we could add (name to be refined?) :
>>>  /backends/relationaldb
>>
>> sure, no problem :)
>>>
>>>
>>> Also, your integration tests probably should be moved together with
>>> our other integration tests. They are currently running WildFly
>>> 10.0.0.Alpha6, but that shouldn't be a problem.
>>>
>>> # Code Style
>>>
>>> We use tabs ;-)
>>> And also have various other "exotic" conventions regarding white-space
>>> usage, right header files, etc..
>>> We use CheckStyle to keep it tidy, it will give you lots of errors and
>>> when there are many it's not very helpful; I would suggest to take the
>>> formatting templates attached at the following link and use your IDE's
>>> formatting capabilities, resort to checkstyle just for the final
>>> validation:
>>>  - http://hibernate.org/search/contribute/
>>>
>>> # JDK
>>>
>>> It looks like your extension requires Java 8; if you could convert it
>>> to Java 7 that would be nice.
>>
>> Don't think it will be an issue. As far I remember we don't use any Java8
>> specific functionality
>>>
>>>
>>> # Rebasing to latest
>>>
>>> I'm afraid we're now aiming at Hibernate ORM 5, so some details might
>>> need to be updated; probably just in the configuration area. We're
>>> also in the process of upgrading to Apache Lucene 5, but that
>>> shouldn't affect you at all.
>>>
>>> # Some improvement ideas
>>>
>>> While we should support the case in which Hibernate Search is not
>>> being run as an extension of Hibernate ORM, that's likely the most
>>> common one.
>>> In that scenario I think it would be nice to be able to lookup the
>>> existing ORM services so that users don't need to repeat for example
>>> the datasource configuration.
>>>
>>> We might also be able to reuse all of the SessionFactory, but I'm not
>>> sure how to include your model without it potentially interfering with
>>> the end user's model; I'd say let's start by sharing some services
>>> from ORM and then see what kind of improvements we can build into ORM
>>> for this use case; for example this might simplify some of the
>>> TransactionManager configuration code I'm seeing in your repository.
>>>
>>> Of course your existing configuration properties are useful too,
>>> especially for the non-ORM case as we'll need be able to reuse the ORM
>>> services.
>>>
>>> Also, you might have noticed we are now able to optionally include the
>>> backend operations in the same transaction. That's not the default, as
>>> commonly people don't want that, but it would be very interesting to
>>> evolve this backend to support that option too, you wouldn't even
>>> require XA when storing the entity in the same database!
>>>  - http://in.relation.to/2015/07/09/hibernate-search-jms-transaction/
>>>
>> Yes, it's very nice feature and fit perfectly with relationaldb
>>>
>>> I'd be happy to help with this, feel free to share non-working and/or
>>> intermediate experimental branches when having questions or just
>>> stuck.
>>> Please start by creating a JIRA, you can leave the target version
>>> undefined: we'll merge it when it's ready.
>>>
>> For all the task you have listed can I create sub task to the JIRA, or how
>> do you track tasks?
>>
>>>
>>> Thanks,
>>> Sanne
>>>
>>>
>>> On 5 August 2015 at 20:05, Flemming Harms <flemming.harms at gmail.com>
>>> wrote:
>>> > Hi Martin
>>> >
>>> > For this version the AbstractDatabaseHibernateSearchController is not
>>> > able
>>> > to process Lucene workers simultaneously. When we build it our initial
>>> > requirement was only one node should process the workers at a time, but
>>> > the
>>> > “master” was floating. We use Quartz to get this type of functionality
>>> > and
>>> > it will synchronizing the execution between the nodes. But you could
>>> > also
>>> > use an HA-singleton to dedicate a specific node to process the workers.
>>> >
>>> > We had been playing with an idea where we stamp the LuceneDatabaseWork
>>> > with
>>> > the known cluster nodes, and then the last node will remove it from the
>>> > database or a scheduled job can take care of it. The advance of this
>>> > solution is it will make Infinispan optional, and it can store the
>>> > indexes
>>> > on each node instead in a shared cache.
>>> >
>>> > Your idea and work look very nice. Pretty awesome feature to support
>>> > different JPA providers.
>>> >
>>> > --
>>> > cheers
>>> > Flemming
>>> >
>>> >
>>> > 2015-08-05 11:57 GMT+02:00 Martin Braun <martinbraun123 at aol.com>:
>>> >
>>> >> Hi,
>>> >>
>>> >>
>>> >> Note: I am no core developer of Hibernate Search, but I am currently
>>> >> working on something
>>> >> that looks quite similar to what you are doing :). One part of it is an
>>> >> updating mechanism based on triggers
>>> >> that uses the database as a event-storage as well. It's not the exact
>>> >> same
>>> >> thing, but related.
>>> >>
>>> >>
>>> >> https://github.com/Hotware/Hibernate-Search-JPA
>>> >>
>>> >>
>>> >>
>>> >> The idea is quite nice, but after looking at the sourcecode I am
>>> >> wondering
>>> >> how the different nodes are able to work together, because in
>>> >> AbstractDatabaseHibernateSearchController you remove the entity
>>> >> from the persistence context and I wasn't able to find code that would
>>> >> make up for that.
>>> >>
>>> >>
>>> >> Doesn't that mean that the other workers will not be able to read that
>>> >> entity?
>>> >> Or will users of this need to implement their own synchronization
>>> >> mechanism between
>>> >> the different nodes?
>>> >>
>>> >>
>>> >> Martin Braun
>>> >> martinbraun123 at aol.com
>>> >> www.github.com/s4ke
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> -----Original Message-----
>>> >> From: Flemming Harms <flemming.harms at gmail.com>
>>> >> To: Hibernate.org <hibernate-dev at lists.jboss.org>
>>> >> Sent: Tue, Aug 4, 2015 6:40 pm
>>> >> Subject: [hibernate-dev] [Hibernate Search] Database back end worker
>>> >>
>>> >>
>>> >> Hey guys
>>> >>
>>> >> I want to introduce myself and a new database back-end worker, me
>>> >> and
>>> >> another guy have build for hibernate search. I already had some initial
>>> >> talk
>>> >> with Sanne regarding if this could be interested to the hibernate
>>> >> search
>>> >> project.
>>> >>
>>> >> I have been working with Hibernate Search from some time and actually
>>> >> done
>>> >> various small custom modification to search since 3.x, especial
>>> >> around
>>> >> running in a cluster and indexing. To make a long story short when
>>> >> we
>>> >> upgraded Hibernate search we thought it would be ideal to use a SQL
>>> >> database
>>> >> as storage for lucene workers for 3 main reasons.
>>> >>
>>> >> - The database was shared
>>> >> between the nodes
>>> >> - The workers was persistent in case of a node crash.
>>> >> - No
>>> >> master/slave
>>> >>
>>> >>
>>> >> *In some way it’s very similar to the JMS back-end worker, where
>>> >> the user
>>> >> also have to implement a MDB that process the workers. In our case
>>> >> they
>>> >> will have to implement a job using something like quartz or a
>>> >> timer
>>> >> service. *
>>> >>
>>> >> *We are using JPA as persistence layer for the database, even
>>> >> it’s a fairly
>>> >> simple entity we persistent, but it make sense for supporting
>>> >> various
>>> >> databases and schema update out of the box. We have tried to make it’s
>>> >> as
>>> >> easy as possible to set-up by minimizing the number of properties, and
>>> >> it’s
>>> >> all configurable from the persistence.xml*
>>> >>
>>> >> *The actually work can* be
>>> >> *find
>>> >> here
>>> >> https://github.com/umbrew/org.umbrew.hibernate.database.worker.backend
>>> >>
>>> >> <https://github.com/umbrew/org.umbrew.hibernate.database.worker.backend>*
>>> >>
>>> >>
>>> >>
>>> >> *So
>>> >> based on this introduction and the code, is this something you could
>>> >> use? (of
>>> >> course with the modification it requires to follow the design,
>>> >> style, docs etc
>>> >> for the search)*--
>>> >>
>>> >> Kind regards
>>> >> Flemming
>>> >> Harms
>>> >> _______________________________________________
>>> >> hibernate-dev mailing
>>> >> list
>>> >> hibernate-dev at lists.jboss.org
>>> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> hibernate-dev mailing list
>>> >> hibernate-dev at lists.jboss.org
>>> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > Kind regards / Med Venlig Hilsen
>>> > Flemming Harms
>>> >
>>> >    -
>>> >
>>> > https://twitter.com/fnharms
>>> > https://dk.linkedin.com/in/fharms
>>> > _______________________________________________
>>> > hibernate-dev mailing list
>>> > hibernate-dev at lists.jboss.org
>>> > https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>
>>
>>
>>
>> --
>>
>> Kind regards / Med Venlig Hilsen
>> Flemming Harms
>>
>> https://twitter.com/fnharms
>> https://dk.linkedin.com/in/fharms
>>
>
> _______________________________________________
> hibernate-dev mailing list
> hibernate-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev