TLDR
- Remove all "optional" Maven dependencies from the project
- Things like the TikaBridge need to live in their own build unit
(their own jar)
- Components which don't have all dependencies shall not be included
in WildFly modules
These are my notes after debugging HSEARCH-1885.
A service can be optionally loaded by the Service Loader pattern, but
all dependencies of each module must be available to the static module
definition.
Our current WildFly modules include the hibernate-search-engine jar,
which has an optional dependencies to Apache Tika.
We don't provide a module of Apache Tika as it has many dependencies,
so there was the assumption that extensions can be loaded from the
user classpath (as it normally works). This one specifically, can't
currently be loaded from the user EAR/WAR as that causes a
java.lang.NoClassDefFoundError: org/apache/tika/parser/Parser
The problem is that, while we initialize the
org.hibernate.search.bridge.builtin.TikaBridge using the correct
classloader (an aggregate from Hibernate ORM which includes the user
deployment), this only initialized the definition of the TikaBridge
itself.
After its class initialization, when this is first used this will
trigger initialization of its import statements; it imports
org.apache.tika.parser.Parser (among others), but at this point we're
out of the scope of the custom classloader usage, so the current
module is being used as the extension was in fact *loaded from* the
classloader for hibernate-search-engine. The point is that the
TikaBridge - while it was loaded from the aggregated classloader - it
was ultimately found in the hibernate-search-engine module and at that
point was associated with that.
A possible workaround is to set the TCCL to the aggregate classloader
during initialization of the TikaBridge and its dependencies, but this
is problematic as we can't predict which other dependencies will be
needed at runtime, when the Tika parsing happens of any random data:
one would also need to store a pointer to this classloader within the
FieldBridge, and then override the TCCL at runtime each time the
bridge is invoked.. that's horrible.
The much simpler solution is to make sure the TikaBridge class is
loaded *and associated* to a classloader which is actually able to
load its extensions! In other words, if the user deployment includes
the Tika extensions, it should also include the TikaBridge.
So the correct solution is to break out this into a Tika module, and
not include it within the WildFly module, but have the users include
it as an extension point, as they would with other custom
FieldBridges.
This problem would apply to any other dependency using the "optional"
qualifier of Maven; currently only our Tika integration relies on it,
so let's remove it but please let's also avoid "optional" in the
future.
Thanks,
Sanne