[hibernate-dev] [HSEARCH] Implementing mass indexer with JSR 352 batch application

Gunnar Morling gunnar at hibernate.org
Wed Feb 24 05:57:33 EST 2016


Hi,

2016-02-24 11:45 GMT+01:00 Sanne Grinovero <sanne at hibernate.org>:
> It sounds interesting, but bear with me I'm not familiar with it so
> I'll throw out some doubts.
>
> Will it work without a JEE container?
> I guess some implementations might be embeddable, but still how simple
> would that be for the user?

Yes, JSR 352 is designed to work with EE and SE.

>
> Note that we already allow rebuilding the index via JMX commands,
> that's quite standard too and it's hooked up by other integrators such
> as Infinispan or the WildFly admin console, and both expose a CLI too.
> I guess other stacks will hook up their favourite approach easily too.

It's not so much pure point of starting I'm interested in, more other
features typically provided by batching infrastructures, e.g. retrying
failed items n-times, eventually excluding them, restarting jobs at
defined points after failures etc.

>
> The idea sounds good but I'm not convinced on this to have a
> compelling benefits/efforts ratio.. especially not if it turns out to
> be slower than our current implementation (heck it's not as fast as
> I'd want it yet but it's ok, and I know how to make it much better as
> long as we control the details)
>
> Performance of this component is important. On the same database I've
> had MassIndexer POCs which would take 6 months to complete, a week to
> complete, or 3 minutes. The current implementation compared to those
> tests is the one which takes approximately 4 hours as I could never
> make the last optimisations generic enough for the general purpose but
> it would be just a matter of fixing open JIRAs for a couple of very
> concrete points.
> My point being that people often need to be able to reindex in a
> couple of hours - no matter the size. 2 hours is of course very human
> arbitrary but it seems to be the general acceptable threshold for such
> an operation: beyond a developer's tool it's also a tool for recovery
> from critical failures so it's absolutely unacceptable that a system
> potentially needs weeks to recover from an issue.

It would have surprised me, hadn't you put forward a perf-related argument ;)

The answer is: I don't know. JSR 352 for sure provides some options
for controlling parallelism etc. It'd be part of the task to do some
perf comparisons with the existing implementation.

>
>
> On 24 February 2016 at 09:56, Vlad Mihalcea <mihalcea.vlad at gmail.com> wrote:
>> +1
>>
>> Sounds like a good idea.
>>
>> On Wed, Feb 24, 2016 at 11:39 AM, Gunnar Morling <gunnar at hibernate.org>
>> wrote:
>>
>>> Hi,
>>>
>>> I've been contemplating the idea of creating a JSR-352-style batch
>>> application for re-indexing one or more entity types in Hibernate
>>> Search.
>>>
>>> Functionally, it'd be the same as the current mass indexer, but using
>>> JSR 352 would provide some nice benefits:
>>>
>>> * Operation through standard batch interfaces (e.g. CLI, web console
>>> or whatever servers provide)
>>> * Standardized monitoring and logging
>>> * Standardized error handling, restartability after failures
>>>
>>> I thought this might be an interesting GSoC idea.
>>>
>>> It's very isolated and also should not be too complex to do. If the
>>> student is quick, one further idea could be to provide some UI
>>> functionality for controlling this (I'm not sure what's already
>>> available in the WF web console).
>>>
>>> Any thoughts?
>>>
>>> Thanks,
>>>
>>> --Gunnar
>>> _______________________________________________
>>> hibernate-dev mailing list
>>> hibernate-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>
>> _______________________________________________
>> hibernate-dev mailing list
>> hibernate-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev


More information about the hibernate-dev mailing list