[hibernate-dev] Scanner contract

Mon Mar 18 11:40:35 EDT 2013

On Mon 18 Mar 2013 09:10:40 AM CDT, Emmanuel Bernard wrote:
>
> That was the point of the Scanning API in the first place: not to depend
> on specific APIs like the VirtualFile. That and the fact that JBoss AS
> had Jandex and we did not were the reason for Ales and me to have this
> interface.
Ok, but the decision to ask the Scanner these questions for each jar is 
inadvertently requiring Hibernate to be sensitive to these protocols.  
In fact the current standard Scanner impl has explicit handling for VFS 
protocol schemes.  And for OSGi, it would require that the single root 
URL (because again thats all we get) actually encompasses the root jar 
*and* any other specified jars.  Aka, `if ( "bundle".equals( urlProtocol 
) ) { doSomethingDrasticallyDifferent(); }...`.

So take the EE case, which I know is true of JBoss.  JBoss is the one 
handing us the PersistenceUnitInfo (which is the thing that backs the  
PersistenceUnitDescriptor we pass back to it in this Scanner proposal).  
I don't think it is unreasonable that JBoss understands the difference 
between PersistenceUnitInfo#getPersistenceUnitRootUrl and 
PersistenceUnitInfo#getJarFileUrls intrinsically (after all, it is the 
one that gave us the PUI).  By intrinsically I mean that it already 
understands that there is a difference/distinction between the root and 
any non-roots.  So from there the question really is whether expecting 
the Scanner to apply different scanning rules to root versus not-root 
URLs constitutes "deep JPA knowledge".  And maybe that is different for 
each type of thing for which we scan.

For classes e.g. nothing stops us from saying the the Scanner just 
returns us everything it finds and that we then filter or weed through 
it.  For example, if it returns us descriptors for all the classes it 
finds, we can apply the "only use that class if it (a) came from a 
specified jar or (b) came from the root and #excludeUnlistedClasses 
allows us to use them" logic ourselves. Thats a trivial matter of us 
defining the ClassDescriptor contract to tell us whether the class was 
found in the root or not:

interface ClassDescriptor {
     public String getClassName();
     public NamedInputStream getNamedInputStream();
     public boolean foundInRoot();
}

But again, I dont think that the Scanner applying that simple logic is 
really "deep JPA knowledge".  But if everyone else agrees it is, we can 
easily alleviate that via the above pattern.

As for resources (orm.xml and hbm.xml files), I'd have to look a little 
closer maybe.  To be honest that is some of the code that uses the "Map 
as an API" approach that makes me want to rip out my hair, gouge out my 
eyes and leap off the nearest bridge :)  It is code like this code that 
has made me want to re-write what was EJB3Configuration from scratch 
ever since I first started looking at it.  That and the fact that any 
time I ask anyone that had a hand in writing that code anything about 
that code they simply throw up their hands and say they don't know ;)

Maybe it is more complicated than I am making it out to be.  But I don't 
think the way this is then presented to the rest of the code needs to be 
complex.  At the end of the day, there should be a scan and the results 
of that scan should be a nice easy-to-understand contract.

> The previous implementation of Scanner was returning the resource
> streams so you had to take PUI.getMappingFileNames() into account. I got
> confused.

Still not following.  The proposed Scanner contract is returning 
NamedInputStreams, which is exactly what the current Scanner returns for 
these.

> No that's what the JPA spec says (or intended), additional jars are to 
> be scanned to
> find the entity classes. Otherwise the additional jar feature becomes
> very limited or even useless. You would only need it otherwise to
> retrieve the XML DD which was considered a second class citizen on JPA 
> 1.0.

Ok, but is that "deep JPA knowledge" beyond what a typical JPA container 
developer would generally know?

>>> - look for META-INF/orm.xml in the root JAr (only) and exclude it if it
>>> is already listed explicitly in the getMappingFileNames to not process
>>> it twice.
>>
>> Not sure how this is classified as "deep JPA knowledge".
>
>
> Understanding and implementing all these rules took me a while :)
Ok, don't include duplicates.. check.  ;)  Not saying the rules are 
simplistic.  Just questioning whether they are only known to those with 
"deep JPA knowledge".  Not so sure that is true.

> I gave you my opinion, if you think that implementing Scanner will
> remain simple enough then most of my remarks are moot.

Considering I will implement the 3 main Scanner impls... ;)  But 
seriously, I don't think delegating to Scanner just once per persistence 
unit puts undue difficulty on the Scanner impl.