[hibernate-dev] "Service" in Hibernate Search: history, lessons learned and rewrite

Wed Apr 27 09:48:47 EDT 2016

On 27 April 2016 at 11:35, Gunnar Morling <gunnar at hibernate.org> wrote:
> Hey Sanne,
>
> great write-up!
>
> 2016-04-27 0:04 GMT+02:00 Sanne Grinovero <sanne at hibernate.org>:
>>
>> The "Service" and "ServiceManager" concepts in Hibernate Search have a
>> specific meaning which is often misunderstood and/or abused, causing
>> trouble.
>>
>> They also changed over time: victim of two major refactorings which
>> evolved the purpose and stretching its intent
>>
>> So I'll change the definition again :-)
>> But hopefully clarifying it, so here is a draft of the rules which I
>> plan to both implement and document carefully on the javadoc, with
>> some comments to highlight what is changing:
>>
>>
>>
>> # A service type is identified by a Class: a Service interface
>>
>> Nothing new here. Yet: worth stressing that this implies that only one
>> implementation will be around.
>>
>> # There can only be ONE IMPLEMENTATION of a service type used by the
>> whole Hibernate Search instance
>>
>> In other words: it was not a good idea to have the
>> LuceneWorkSerializer to be modelled as a Service, back when we
>> supposedly could use a different serialization strategy for a
>> different index.
>>
>> Yet it is a good idea nowadays to have LuceneWorkSerializer extend
>> Service, as we dropped that level of flexibility. This implies that
>> there's a single type of serializer (at most) and it's totally fine to
>> expose this as:
>>
>>    SearchIntegrator#getLuceneWorkSerializer()
>>
>> [this method doesn't exist yet, but I'm thinking of adding it for our
>> convenience and the following other points]
>
>
> It's not clear to me when you'd provide such convenience getter and when
> not. What's the criteria for it? It appears you consider different
> categories of services?

Indeed this bothers me: "Service" is a general purpose abstraction,
and in practice several components could be refactored to become a
Service, while also many Service(s) could be refactored to not be one.

Drawing a well defined line is not practical.

Take a look at the ServiceIntegrator: why do we have a
getErrorHandler(), and why is that not a Service? (hint: historical
reasons)

Even better, look at ImmutableSearchFactory: it's a placeholder for
many things which could be services.
Why is TimeoutExceptionFactory not a Service?

The problem is twofold:
 - it's unclear
 - we keep discussing these stuff, while it has very low value to the end users

So my take is going to be a pragmatic one: having getters on
`ServiceIntegrator` is an approach which worked very well so far and I
don't feel the need to be moving everything into a "Services" model.

I do like the Services abstraction as a way to allow SPIs (think
Infinispan Query, but also beyond) to customize and override internal
components.

Another benefit of Services is that one custom component might need to
lookup another custom component, and this code living outside of the
`engine` module you can't add a getter for all your custom stuff.
Example: see the services defined by the JGroups backend:
`NodeSelectorService` and `MessageSenderService` are very specific to
the design of this backend, and depend on JGroups.

Clearly something like a "ClassLoaderService" is useful to a wide
scope and having a simple getter is just simpler.

What I think was a mistake, is to allow exposing the ServiceManager to
SPI as it is prone to abuse: while we know that a FieldBridgeFactory
will need the ability to instantiate classes by name, that means it
needs access to a ClassLoaderService but today we're passing it the
whole ServiceManager ; this opens the door for lookup of other
services, and we get quickly entangled in complex dependency graphs.

There are some cases of valid inter-service dependencies, but these
are very limited (e.g. the above JGroups services).
So I'm not proposing to scrap this capability from the ServiceManager
as it is a very useful thing, yet I don't see why e.g. the
FieldBridgeFactory needs more than one trivial line of code to load
the classes it needs.
Alternatively, we could see to refactor those few Services which need
additional services to not need this anymore - yet I don't see why
since we have a beautiful ServiceManager which can do such things :)

> Personally, I'd like if each service would obtain the other services it
> depends on through its constructor. Then it can store the dependencies in
> (final) fields and reference them as needed without any sort of look-up.
> Services would not have to deal with the service manager anymore, they'd
> just get those services they need. One thing I dislike about the current
> approach is that you cannot make services fully immutable via final fields
> if you depend on other services.

It's a great idea. We should do that, but again since we don't have
much time I'm forced to keep it practical :-(
If you have time to propose a KISS patch (we're not in the business of
DI frameworks) I'd love to apply it but remember our priority is
Elasticsearch at this point.. I merely had to start this thread to
clarify were I'd like to go with e.g. the JestClient and the
SerializerService.

>> P.S. we're only maintaining - and bundling - a single Serializer
>> implementation so it's no surprise that we can handle only one.. yet
>> this implies people wanting to override it have to either hack our
>> bootstrap or physically remove our implementation.
>>
>> # A Service implementation can be provided by having it injected at
>> bootstrap (i.e.
>> org.hibernate.search.cfg.spi.SearchConfiguration.getProvidedServices()
>> )
>>
>> Not a new rule either: repeating for clarity. We call these "provided"
>> services.
>>
>> # If a service isn't "provided", then we attempt to create one using
>> java.util.ServiceLoader
>>
>> Currently this expects a single implementation to be available:
>> there's no way to pick which one if there are multiple implementations
>> on the classpath.
>> I think we'll need to be able to pass a "hint" or similar to the
>> requestService to allow expression of preferences, handle shortnames,
>> etc.. a proposal for that will follow when there will be need: at this
>> point it's important to clarify the limitation, as this expresses what
>> a Service is not able to model today.
>
>
> +1 for shaping requirements around this once it's needed. Atm. it appears a
> bit blurry to me.

There was no request for this, but to unblur the problem: say a power
user wants to plug in its own SerializerService implementation, and
bundles it with its app to deploy on WildFly. He'll get an exception
as now there are two providers for the same service, and he can't
easily "remove" the other one as it's provided by the app server.
He can't override the provided services either, as the JPA context is
booted by the app server.

Would be nicer to have a property to allow picking which
implementation to use, rather than throw exceptions.

>>
>>
>> Currently implementations are looked up "on demand". I plan to allow
>> "pre-initialize" services as it removes some trouble; these components
>> could have convenience getters, not least to remove the concurrency
>> overhead.
>> Remember that since there's only one implementation for a given type
>> around, there's no reason to not do this: the intent of the Service
>> contract is to allow people to inject a customized implementation.
>
>
> Sounds good.
>
>> # If a Service implementation also implements Startable, or Stoppable,
>> we'll invoke the respective methods once at start and/or at stop of
>> the Search instance - unless they are provided in which case they are
>> ignored.
>>
>> The current javadoc suggests that it's illegal for a provided
>> implementation to also implement Startable and/or Stoppable; I don't
>> remember why that was, but today it seems unfitting: people might want
>> to extend one of our implementations, or reuse some of the
>> implementations normally auto-started but reuse them "by instance" by
>> providing them to multiple Search instances to save memory (we
>> actually have a need for this for Index Affinity in Infinispan).
>> The important concept which will survive, is that we don't start or
>> stop stuff which is provided as that's clearly responsibility of
>> another component.
>
>
> Why is this? Above you say it's unfitting? I'm not quite clear on the
> proposed direction.

It's very simple: say I want to extend JestClient and use my custom
implementation, e.g. I am experimenting with some http pooling
capabilities which Search doesn't have.

I can ignore the start, and initialize it my own way.. but Search will
refuse this Service instance as it's also inheriting @Startable.

I'd prefer to allow people to inject their own stuff without being
overly strict; BTW the JestClient service isn't using an interface to
identify itself so users will need to extend it .. it's a limitation I
think we need to fix, we've always been very friendly to allow people
to override components.

>
>> # All non-provided Services will be stopped once, and only once as
>> final step when the SearchIntegrator is stopped.
>>
>> This is a significant difference with today's code: we expect the
>> Service consumers to "open / close", hopefully in a finally block, to
>> the point that Gunnar enhanced it to at least allow AutoClose
>> semantics.
>> Yet, I don't want runtime code to open and close these frequently as
>> it has been a bottleneck in the past.
>
>
> +1 for setting up the graph once and avoiding all the request/release
> hassle. Ideally with "constructor injection" as described above.

+1

>>
>> It also led to the creations of issues like HSEARCH-1589 : we might
>> start/stop the same service frequently, and need to improve with
>> reference counters.
>>
>> I suspect that historically the reasoning was to make sure that the
>> order of teardown would follow the inverse order of bootstrap as
>> components would cleanup after themselves, but having clarified that
>> Service instances are unique globally, should also imply that their
>> state doesn't depend on other Services. So the teardown order doesn't
>> matter anymore.. we'll start one for each, but only close it at
>> shutdown.
>
>
> I'm not following on that one. Provided one service uses another and stores
> it in a field, tear down order may matter indeed (if a service is using
> another one in its close() logic). It'd be resolved easily if circles are
> forbidden.

Right those services depending on others might still need it. We might
still need reference counters, or at least differentiate for those
services which are frequently looked up and could bypass them.

>
>> # Hierarchy?
>>
>> We've talked about global components so far. It's clear that the
>> IndexManager has a central role in the overall architecture, as we
>> tend to allow per-index customisations. Or per-family customisations,
>> as suggested in my previous email.
>>
>> An example which affects Service:
>> The "JestClient client" [the Service we use to connect to
>> Elasticsearch] could be considered a good fit for being a "Service" as
>> this allows people to override the client implementation and/or inject
>> a pre-configured instance.. yet it's not a good fit if we want to
>> allow people to connect to different hosts for different indexes.
>>
>> I don't plan to implement the hierarchical ServiceManager right now,
>> but proposing it already so that we can agree on the above cleanups in
>> contract, with the perspective that there are cleaner solutions also
>> for the scoped use case.
>
>
> It's an interesting idea. The granularity of the hierarchy would have to be
> discussed. Would you e.g. have one service manager per index which inherits
> the global one?
>
>> Implementing these changes resolves or obsoletes at least 10 JIRA
>> issues in one shot..
>
>
> What is your plan for this? Which ones do you plan to address as of 5.6,
> (5.7), 6 or farther in the future?

Implementing this plan has no visible benefit to end users, other than
say be able to override the JestClient but that can be achieved by
implementing just one step.

So, purely practical oriented:
 1) make you all aware of the limitations of Service to get the new
code in the right direction
 2) simplify things which give us benefit in new coding
 3) move to the above plan iteratively, prioritizing what we need e.g.
I need https://github.com/hibernate/hibernate-search/pull/1082 for
various other reasons so I did it already.

Thanks,
Sanne