On 05/09/2012 02:01 PM, David M. Lloyd wrote:
OK I admit I LOL'ed.
On 05/09/2012 11:50 AM, Emmanuel Bernard wrote:
> Now that I have your attention, I'd like to discuss issues we are experiencing
when trying to modularize the Hibernate portfolio and make it work in AS 7.1.
>
> ## Disclaimer
>
> I perfectly understand all the coolness about modularity (speed, easier dependency
isolation etc). I have also carefully read :
>
> -
https://community.jboss.org/wiki/ModuleCompatibleClassloadingGuide
> -
https://community.jboss.org/wiki/ModularSerialization
>
> But these tend to avoid the more complex cases of portable libraries that ought to
run even outside AS 7 but have a wide variety of class and resource loading needs.
> I am not a complete modularity bozo but I am definitely not familiar with JBoss
Modules nor similar solution.
>
> ## Requirements / Landscape
>
> Hibernate ORM uses the notion of service registry and integrator object that help
during the integration or customization of the engine behavior by third-party frameworks.
> Enlistment of Integrators is done via the service locator pattern (a service file in
META-INF/services/ that is looked up and contain the implementation class(es) at stake.
>
> Hibernate Envers is one of those customizer that depends on Hibernate ORM. Note that
the core of Hibernate ORM does not depend on Hibernate Envers. The service locator file is
contained in Hibernate Envers JAR.
> Hibernate OGM likewise, heavily customizes ORM and depends on Hibernate ORM classes -
the reverse is not true. The service locator file is contained in Hibernate OGM JAR.
> Hibernate Search optionally depend on Hibernate ORM and JPA. The core of Hibernate
Search is independent but an Hibernate Search ORM module has an integrator implementation.
On top of that, Hibernate Search optionally depend on some JPA classes and behaves
differently if they are there - we look them up in the classpath by reflection.
>
> On top of that, these projects do load resources (config files, classes):
>
> - from what Jason calls a Deployment classloader (the user application classes and
resources really) - entities, custom analyzer implementations, resources files etc. A user
could even write a custom Integrator and use the service locator pattern from his
application.
> - from direct dependencies (Lucene is a declared dependency of Hibernate Search)
> - from dependencies of the deployment: for example an app developer adds the phonetic
analyzer as a dependency of his application and ask Hibernate Search to use it
> - from modules that use these projects. Modeshape and Capedwarf are being modularized
and are making use of Hibernate Search as a module. Properly loading the necessary classes
located in Modeshape or Capedwarf's module but from Hibernate Search's engine
proves to be very hard in our current approach.
>
> All of these projects should be able to run outside JBoss AS 7, so a modular friendly
solution should somehow be abstracted and generic enough.
>
> ## What solution?
>
> More and more projects are being modularized including ones with complex resource
loading dependencies like the ones I have described. AFAIK Infinispan is even in a worse
situation as clustering and late class binding is at stake but let's put this one
aside.
> I'd love to get a reference design outcome from this thread that could be copied
across all these projects and future ones like Bean Validation.
>
> Today, we mostly use the good old and simple TCCL model which works fine if the jars
are directly embedded in the app but fail the minute we start to move these dependencies
into modules. Sanne, Strong, Scott Marlow and I are using dangerous amount of Advil to try
and make everything work as expected. Some help would be awesome.
>
> To sum up:
>
> - can the Hibernate portfolio be supported within JBoss Module and how?
> - what kind of ClassloaderService contract should we use within these projects to be
modular friendly (JBoss Modules and others)?
> - would such contract be generic enough to be extrapolated to JSRs in need of modular
friendliness?
> - how to solve the chicken and egg issue of the bootstrapping: if we need to pass a
ClassloaderService impl?
How do we do that best in a modular environment without forcing the
application developer to implement such godforsaken ClassloaderService
contract or even worse pass directly to us the right classloader for
each call.
I'll just start at the beginning and you can skip over the background if
you like.
The key starting concept is that a class' (or package's) identity is not
just its name but also its class loader. This is the underlying
(existing) truth that modularity brings to the fore. Corollary to this
are the fact that a single VM may have more than one class or package
with the same name, as well as the fact that not all classes/packages
are always directly visible from a given class loader.
This problem (as you've seen) manifests itself primarily when you're
locating a class or a resource by name. You basically have two options.
You can search *relative* to a class loader (most commonly, TCCL,
though using a single explicit class loader or the caller's class loader
also fall into this category). Or, you can use the *absolute* identity
of a class.
Using relative resolution is often a perfectly adequate solution when
you're loading a single class or resource; in fact for some cases (like
ServiceLoader for example) it's a perfect fit in conjunction with TCCL
(in its capacity as an identifier for the "current" application). You
want the user to be able to specify their implementation of something,
and you want it to be reasonably transparent; ServiceLoader+TCCL does
this fairly well.
ServiceLoader also does well from the perspective of APIs with a static,
fixed number of implementations. In this case, it is appropriate for a
framework to have ServiceLoader use the class loader of the framework
itself. The framework would then be sure to import the implementations
in question (including their service descriptors); in our modular
environment, which we call a "service import". Note that this often
means there is a circular dependency between API and implementation:
that's OK!
We currently use this for envers but that doesn't seem as desirable for
other members of the Hibernate portfolio that may be on a separate
lifecycle. For example, the Hibernate OGM is a persistence provider
that depends on Hibernate ORM. If we have Hibernate ORM depend on OGM,
that limits the number of OGM versions that can be in use on AS7.
Would it be possible, to add a MSC enhancement, that allows an inverse
dependency service loader dependency to be expressed? Such that it
would be enough to only have OGM depend on ORM (with an inverse service
dependency specified). I'm thinking that the OGM module would need to
exchange the service dependency information with the ORM module and
clear it, when OGM goes away.
If this is possible, it would make a nice future enhancement IMO.
A third ServiceLoader option is of course to simply accept a class
loader argument when looking up an implementation. This grants the most
flexibility and tends to work well in just about any environment, though
it may be somewhat lacking aesthetically, if you care about such things.
I'm not sure of how OGM would make its presence known to ORM currently.
Probably via a custom SPI that allows the inverse service loader
dependency to be expressed (so that
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-core/src...
can know about Hibernate search/envers/ogm/...).
If we cannot have a MSC way to express the inverse service loader
dependency, this sounds like the next best option.
In any case, ServiceLoader already fits in very well to modularity; it's
just a question of understanding your use case to know the appropriate
way to apply it. The key characteristic however of such a fit is that
it is trying to load a single resource of some kind. Once you move into
object graph territory, things become a hell of a lot more complex when
it comes to the relative resolution game.
A good example is serialization. Having a single class loader for all
resolution needs often simply doesn't cut it. It works to an extent,
iff the object graph in question never "escapes" what the application
(or current class loader) is cognizant of. However it may well be the
case that an implementation class isn't "visible" to the single class
loader. In this case, especially if more than one class with a given
name is existent in the system, there's no unambiguous solution to load
a class relative to an application, unless you explicitly add the
desired class loaders to the resolution path of the application's class
loader.
The two solutions to this problem are to either enforce (at serialize
time) a policy which prevents serializing objects of non-visible
classes, or to give up relative resolution and go to "absolute" identities.
When you use an absolute identity, you're persisting not only the class'
name but also the identity of its initiating class loader. Back in the
RMI and applet days, this would have been done (rather clumsily) by
extension name or perhaps code source URL. In our container environment
we tend to use module identifier. But either way, this identity is
useless unless you have a mechanism to resolve it back to a class
loader. This mechanism (at least as of today) is going to vary
substantially from one runtime environment to another though. Thus
being able to plug in to the process is critical [1].
There's another tricky dimension to this problem though. Say you want
everything - the ability to use the "current" application to resolve a
class (i.e. to disambiguate a class of a given name from a neighbor
application which might want to execute the exact same code but get its
own relatively visible class), but also the ability to absolutely
resolve "invisible" classes. This would be common in the case where you
have two EAR deployments in an app server, each bundling their own copy
of an EJB JAR (for whatever reason) but which both link against common
modules.
There's no silver bullet here, but this problem can be solved to a
significant extent if you know which class loaders are candidates for
relative resolution and marking them as such in the target externalized
class data. In AS for example, we could use information about the
deployment class-path graph to distinguish what was bundled with the app
and what is external to it, because we know that deployment classes are
very likely to be visible from the TCCL (and if not, we would not have a
very large class loader landscape over which to search).
Of course as an end user you don't really want to think about any of
this crap, you just want it to work. The user can't avoid having some
knowledge of this though - they have to know what policy they apply to
the data they're accessing. Sometimes it's simple - if they have a flat
class loader, and they're reading modularized serial data, they can just
discard the class loader identification because if a class isn't found
in their class loader, it's not going to be found anywhere else either.
Sometimes it's more complex though. Writing from one environment and
reading from a completely differently structured environment can get
extremely hairy when using absolute resolution, requiring various
degrees of translation. The best advice would be to have users strive
to use the same kind of environment when dealing with serialized data.
For example, modular writers should be consumed by modular readers.
Note that though I use serialization as my main example, these concepts
should apply as well (at a certain level) to Hibernate and friends,
Infinispan, etc.
[1] As an example, see
https://github.com/dmlloyd/jboss-marshalling/blob/master/api/src/main/jav...