[jboss-dev] Further profling: Where should I focus?

Bill Burke bburke at redhat.com
Tue Jan 5 17:39:50 EST 2010



David M. Lloyd wrote:
> That's fine BUT a general-purpose caching system will just fuck things up 
> again.  

With 3.5 years of development you're telling me that our testsuite isn't 
good enough to pick up errors introduced by a "general-purpose caching 
system" or other optimizations?

> If you know the info won't change, then don't keep asking for it. 

Easier said than done.

Take for instance VirtulFile.visit().  Any visitor that is run calls 
isDirectory().  For each isDirectory() there is at least one 
ConcurrentHashMap lookup to find the mount, then to delegate to the 
FileSystem interface which itself does hash lookups.

Then, visit() continues with calling getChildren() if the file is a 
directory.  getChildren() recreates a totally brand new VirtualFile 
list, by, yes, doing N more hash lookups for each child entry.

> There are many different ways in which VFS is used - mounting a library, 
> for example, can be done with a plain zip mount (in other words, everything 
> is in RAM), whereas mounting a hot deployment must first copy the JAR 
> before mounting to avoid locking issues on windows and the "disappearing 
> classes" problem which occurs when you delete the JAR out from under the 
> classloader before the classloader is done with it.
> 
> Mounting an exploded hot deployment must be done with a more complex copy 
> operation so that changes to dynamic content can be propagated using 
> different rules based on the content (e.g. HTML files should be copied 
> immediately, but the semantics of changing class files is less clear). 
> What can we cache here?
> 

You understand that just to register a jar library with a deployer 
archive, sar, etc, you're running VFS and creating these structures?

**************
Jar contents don't need these special hot deployment or mounting cases 
as they are static and always will be.
***************

Caching bugs can be mitigated and supported simply by asking the file 
system if it is ok to cache indefinitely.  Or even better, by having the 
FileSystem serve up/create/instantiate VirtualFile instances.  Then you 
can have special cases per FileSystem type that are fully optimized.

> The "ownership" semantics of static files are not clearly defined either, 
> and can depend on what is being mounted and why.  I suspect you'll find 
> that there are very few (if any) cases where the filesystem is being hit 
> for information such as "isDirectory" or "lastModified" information outside 
> of hot deployment, which is the case where it can't be cached anyway.
> 

"isDirectory" is being hit *constantly* by anything that uses the VFS. 
In my little test with ~5000 jar entries, it is being called 15,000 times.

Sure, the "real" file system is not necessarily being hit, but there are 
a GIZILLION unnecessary hash lookups, object creations, etc each and 
every time a jar or directory is browsed (annotation scanning, package 
names, subdeployment searches).

>> As I stated early, define a parallel non-cache api that invalidates the
>> cache on diffs for sensitive operations (like hot-deployment).
> 
> Sure, if you like bugs and broken behavior.  Correctness first, speed 
> second.

VFS has had 3-4 years to get correctness right.  If correctness isn't 
covered by the testsuite, then any significant code change will cause 
"bugs and broken behavior"  especially a complete rewrite of VFS via VFS 3.

>  This is a complex problem that you cannot paint with a broad 
> brush.  You must look at each use case separately before you can say stuff 
> like "we need caching".
> 

Just to let you know, I did work on VFS and optimizations of VFS end of 
2006 for a month or two.  It used to do stupid shit like calculating 
path names each and every time you asked for it.

-- 
Bill Burke
JBoss, a division of Red Hat
http://bill.burkecentral.com



More information about the jboss-development mailing list