[jboss-as7-dev] Modularity is the spawn of Lucifer and a stinking donkey

David M. Lloyd david.lloyd at redhat.com
Wed May 9 14:01:05 EDT 2012


OK I admit I LOL'ed.

On 05/09/2012 11:50 AM, Emmanuel Bernard wrote:
> Now that I have your attention, I'd like to discuss issues we are experiencing when trying to modularize the Hibernate portfolio and make it work in AS 7.1.
>
> ## Disclaimer
>
> I perfectly understand all the coolness about modularity (speed, easier dependency isolation etc). I have also carefully read :
>
> - https://community.jboss.org/wiki/ModuleCompatibleClassloadingGuide
> - https://community.jboss.org/wiki/ModularSerialization
>
> But these tend to avoid the more complex cases of portable libraries that ought to run even outside AS 7 but have a wide variety of class and resource loading needs.
> I am not a complete modularity bozo but I am definitely not familiar with JBoss Modules nor similar solution.
>
> ## Requirements / Landscape
>
> Hibernate ORM uses the notion of service registry and integrator object that help during the integration or customization of the engine behavior by third-party frameworks.
> Enlistment of Integrators is done via the service locator pattern (a service file in META-INF/services/ that is looked up and contain the implementation class(es) at stake.
>
> Hibernate Envers is one of those customizer that depends on Hibernate ORM. Note that the core of Hibernate ORM does not depend on Hibernate Envers. The service locator file is contained in Hibernate Envers JAR.
> Hibernate OGM likewise, heavily customizes ORM and depends on Hibernate ORM classes - the reverse is not true. The service locator file is contained in Hibernate OGM JAR.
> Hibernate Search optionally depend on Hibernate ORM and JPA. The core of Hibernate Search is independent but an Hibernate Search ORM module has an integrator implementation. On top of that, Hibernate Search optionally depend on some JPA classes and behaves differently if they are there - we look them up in the classpath by reflection.
>
> On top of that, these projects do load resources (config files, classes):
>
> - from what Jason calls a Deployment classloader (the user application classes and resources really) - entities, custom analyzer implementations, resources files etc. A user could even write a custom Integrator and use the service locator pattern from his application.
> - from direct dependencies (Lucene is a declared dependency of Hibernate Search)
> - from dependencies of the deployment: for example an app developer adds the phonetic analyzer as a dependency of his application and ask Hibernate Search to use it
> - from modules that use these projects. Modeshape and Capedwarf are being modularized and are making use of Hibernate Search as a module. Properly loading the necessary classes located in Modeshape or Capedwarf's module but from Hibernate Search's engine proves to be very hard in our current approach.
>
> All of these projects should be able to run outside JBoss AS 7, so a modular friendly solution should somehow be abstracted and generic enough.
>
> ## What solution?
>
> More and more projects are being modularized including ones with complex resource loading dependencies like the ones I have described. AFAIK Infinispan is even in a worse situation as clustering and late class binding is at stake but let's put this one aside.
> I'd love to get a reference design outcome from this thread that could be copied across all these projects and future ones like Bean Validation.
>
> Today, we mostly use the good old and simple TCCL model which works fine if the jars are directly embedded in the app but fail the minute we start to move these dependencies into modules. Sanne, Strong, Scott Marlow and I are using dangerous amount of Advil to try and make everything work as expected. Some help would be awesome.
>
> To sum up:
>
> - can the Hibernate portfolio be supported within JBoss Module and how?
> - what kind of ClassloaderService contract should we use within these projects to be modular friendly (JBoss Modules and others)?
> - would such contract be generic enough to be extrapolated to JSRs in need of modular friendliness?
> - how to solve the chicken and egg issue of the bootstrapping: if we need to pass a ClassloaderService impl? 
How do we do that best in a modular environment without forcing the 
application developer to implement such godforsaken ClassloaderService 
contract or even worse pass directly to us the right classloader for 
each call.

I'll just start at the beginning and you can skip over the background if 
you like.

The key starting concept is that a class' (or package's) identity is not 
just its name but also its class loader.  This is the underlying 
(existing) truth that modularity brings to the fore.  Corollary to this 
are the fact that a single VM may have more than one class or package 
with the same name, as well as the fact that not all classes/packages 
are always directly visible from a given class loader.

This problem (as you've seen) manifests itself primarily when you're 
locating a class or a resource by name.  You basically have two options. 
  You can search *relative* to a class loader (most commonly, TCCL, 
though using a single explicit class loader or the caller's class loader 
also fall into this category).  Or, you can use the *absolute* identity 
of a class.

Using relative resolution is often a perfectly adequate solution when 
you're loading a single class or resource; in fact for some cases (like 
ServiceLoader for example) it's a perfect fit in conjunction with TCCL 
(in its capacity as an identifier for the "current" application).  You 
want the user to be able to specify their implementation of something, 
and you want it to be reasonably transparent; ServiceLoader+TCCL does 
this fairly well.

ServiceLoader also does well from the perspective of APIs with a static, 
fixed number of implementations.  In this case, it is appropriate for a 
framework to have ServiceLoader use the class loader of the framework 
itself.  The framework would then be sure to import the implementations 
in question (including their service descriptors); in our modular 
environment, which we call a "service import".  Note that this often 
means there is a circular dependency between API and implementation: 
that's OK!

A third ServiceLoader option is of course to simply accept a class 
loader argument when looking up an implementation.  This grants the most 
flexibility and tends to work well in just about any environment, though 
it may be somewhat lacking aesthetically, if you care about such things.

In any case, ServiceLoader already fits in very well to modularity; it's 
just a question of understanding your use case to know the appropriate 
way to apply it.  The key characteristic however of such a fit is that 
it is trying to load a single resource of some kind.  Once you move into 
object graph territory, things become a hell of a lot more complex when 
it comes to the relative resolution game.

A good example is serialization.  Having a single class loader for all 
resolution needs often simply doesn't cut it.  It works to an extent, 
iff the object graph in question never "escapes" what the application 
(or current class loader) is cognizant of.  However it may well be the 
case that an implementation class isn't "visible" to the single class 
loader.  In this case, especially if more than one class with a given 
name is existent in the system, there's no unambiguous solution to load 
a class relative to an application, unless you explicitly add the 
desired class loaders to the resolution path of the application's class 
loader.

The two solutions to this problem are to either enforce (at serialize 
time) a policy which prevents serializing objects of non-visible 
classes, or to give up relative resolution and go to "absolute" identities.

When you use an absolute identity, you're persisting not only the class' 
name but also the identity of its initiating class loader.  Back in the 
RMI and applet days, this would have been done (rather clumsily) by 
extension name or perhaps code source URL.  In our container environment 
we tend to use module identifier.  But either way, this identity is 
useless unless you have a mechanism to resolve it back to a class 
loader.  This mechanism (at least as of today) is going to vary 
substantially from one runtime environment to another though.  Thus 
being able to plug in to the process is critical [1].

There's another tricky dimension to this problem though.  Say you want 
everything - the ability to use the "current" application to resolve a 
class (i.e. to disambiguate a class of a given name from a neighbor 
application which might want to execute the exact same code but get its 
own relatively visible class), but also the ability to absolutely 
resolve "invisible" classes.  This would be common in the case where you 
have two EAR deployments in an app server, each bundling their own copy 
of an EJB JAR (for whatever reason) but which both link against common 
modules.

There's no silver bullet here, but this problem can be solved to a 
significant extent if you know which class loaders are candidates for 
relative resolution and marking them as such in the target externalized 
class data.   In AS for example, we could use information about the 
deployment class-path graph to distinguish what was bundled with the app 
and what is external to it, because we know that deployment classes are 
very likely to be visible from the TCCL (and if not, we would not have a 
very large class loader landscape over which to search).

Of course as an end user you don't really want to think about any of 
this crap, you just want it to work.  The user can't avoid having some 
knowledge of this though - they have to know what policy they apply to 
the data they're accessing.  Sometimes it's simple - if they have a flat 
class loader, and they're reading modularized serial data, they can just 
discard the class loader identification because if a class isn't found 
in their class loader, it's not going to be found anywhere else either.

Sometimes it's more complex though.  Writing from one environment and 
reading from a completely differently structured environment can get 
extremely hairy when using absolute resolution, requiring various 
degrees of translation.  The best advice would be to have users strive 
to use the same kind of environment when dealing with serialized data. 
For example, modular writers should be consumed by modular readers.

Note that though I use serialization as my main example, these concepts 
should apply as well (at a certain level) to Hibernate and friends, 
Infinispan, etc.

[1] As an example, see 
https://github.com/dmlloyd/jboss-marshalling/blob/master/api/src/main/java/org/jboss/marshalling/ClassResolver.java#L34
-- 
- DML


More information about the jboss-as7-dev mailing list