Re: [jboss-dev] Further profling: Where should I focus?

Tuesday, 5 January 2010

David M. Lloyd wrote:
...
 That's fine BUT a general-purpose caching system will just fuck
things up 
 again.   
With 3.5 years of development you're telling me that our testsuite isn't 
good enough to pick up errors introduced by a "general-purpose caching 
system" or other optimizations?

...
 If you know the info won't change, then don't keep asking for
it.  
Easier said than done.

Take for instance VirtulFile.visit().  Any visitor that is run calls 
isDirectory().  For each isDirectory() there is at least one 
ConcurrentHashMap lookup to find the mount, then to delegate to the 
FileSystem interface which itself does hash lookups.

Then, visit() continues with calling getChildren() if the file is a 
directory.  getChildren() recreates a totally brand new VirtualFile 
list, by, yes, doing N more hash lookups for each child entry.

...
 There are many different ways in which VFS is used - mounting a
library, 
 for example, can be done with a plain zip mount (in other words, everything 
 is in RAM), whereas mounting a hot deployment must first copy the JAR 
 before mounting to avoid locking issues on windows and the "disappearing 
 classes" problem which occurs when you delete the JAR out from under the 
 classloader before the classloader is done with it.

 Mounting an exploded hot deployment must be done with a more complex copy 
 operation so that changes to dynamic content can be propagated using 
 different rules based on the content (e.g. HTML files should be copied 
 immediately, but the semantics of changing class files is less clear). 
 What can we cache here?

You understand that just to register a jar library with a deployer 
archive, sar, etc, you're running VFS and creating these structures?

**************
Jar contents don't need these special hot deployment or mounting cases 
as they are static and always will be.
***************

Caching bugs can be mitigated and supported simply by asking the file 
system if it is ok to cache indefinitely.  Or even better, by having the 
FileSystem serve up/create/instantiate VirtualFile instances.  Then you 
can have special cases per FileSystem type that are fully optimized.

...
 The "ownership" semantics of static files are not clearly
defined either, 
 and can depend on what is being mounted and why.  I suspect you'll find 
 that there are very few (if any) cases where the filesystem is being hit 
 for information such as "isDirectory" or "lastModified" information
outside 
 of hot deployment, which is the case where it can't be cached anyway.

"isDirectory" is being hit *constantly* by anything that uses the VFS. 
In my little test with ~5000 jar entries, it is being called 15,000 times.

Sure, the "real" file system is not necessarily being hit, but there are 
a GIZILLION unnecessary hash lookups, object creations, etc each and 
every time a jar or directory is browsed (annotation scanning, package 
names, subdeployment searches).

...
> As I stated early, define a parallel non-cache api that
invalidates the
> cache on diffs for sensitive operations (like hot-deployment).

 Sure, if you like bugs and broken behavior.  Correctness first, speed 
 second. 
VFS has had 3-4 years to get correctness right.  If correctness isn't 
covered by the testsuite, then any significant code change will cause 
"bugs and broken behavior"  especially a complete rewrite of VFS via VFS 3.

...
  This is a complex problem that you cannot paint with a broad 
 brush.  You must look at each use case separately before you can say stuff 
 like "we need caching".

Just to let you know, I did work on VFS and optimizations of VFS end of 
2006 for a month or two.  It used to do stupid shit like calculating 
path names each and every time you asked for it.

-- 
Bill Burke
JBoss, a division of Red Hat
http://bill.burkecentral.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [jboss-dev] Further profling: Where should I focus?