David M. Lloyd wrote:
That's fine BUT a general-purpose caching system will just fuck
things up
again.
With 3.5 years of development you're telling me that our testsuite isn't
good enough to pick up errors introduced by a "general-purpose caching
system" or other optimizations?
If you know the info won't change, then don't keep asking for
it.
Easier said than done.
Take for instance VirtulFile.visit(). Any visitor that is run calls
isDirectory(). For each isDirectory() there is at least one
ConcurrentHashMap lookup to find the mount, then to delegate to the
FileSystem interface which itself does hash lookups.
Then, visit() continues with calling getChildren() if the file is a
directory. getChildren() recreates a totally brand new VirtualFile
list, by, yes, doing N more hash lookups for each child entry.
There are many different ways in which VFS is used - mounting a
library,
for example, can be done with a plain zip mount (in other words, everything
is in RAM), whereas mounting a hot deployment must first copy the JAR
before mounting to avoid locking issues on windows and the "disappearing
classes" problem which occurs when you delete the JAR out from under the
classloader before the classloader is done with it.
Mounting an exploded hot deployment must be done with a more complex copy
operation so that changes to dynamic content can be propagated using
different rules based on the content (e.g. HTML files should be copied
immediately, but the semantics of changing class files is less clear).
What can we cache here?
You understand that just to register a jar library with a deployer
archive, sar, etc, you're running VFS and creating these structures?
**************
Jar contents don't need these special hot deployment or mounting cases
as they are static and always will be.
***************
Caching bugs can be mitigated and supported simply by asking the file
system if it is ok to cache indefinitely. Or even better, by having the
FileSystem serve up/create/instantiate VirtualFile instances. Then you
can have special cases per FileSystem type that are fully optimized.
The "ownership" semantics of static files are not clearly
defined either,
and can depend on what is being mounted and why. I suspect you'll find
that there are very few (if any) cases where the filesystem is being hit
for information such as "isDirectory" or "lastModified" information
outside
of hot deployment, which is the case where it can't be cached anyway.
"isDirectory" is being hit *constantly* by anything that uses the VFS.
In my little test with ~5000 jar entries, it is being called 15,000 times.
Sure, the "real" file system is not necessarily being hit, but there are
a GIZILLION unnecessary hash lookups, object creations, etc each and
every time a jar or directory is browsed (annotation scanning, package
names, subdeployment searches).
> As I stated early, define a parallel non-cache api that
invalidates the
> cache on diffs for sensitive operations (like hot-deployment).
Sure, if you like bugs and broken behavior. Correctness first, speed
second.
VFS has had 3-4 years to get correctness right. If correctness isn't
covered by the testsuite, then any significant code change will cause
"bugs and broken behavior" especially a complete rewrite of VFS via VFS 3.
This is a complex problem that you cannot paint with a broad
brush. You must look at each use case separately before you can say stuff
like "we need caching".
Just to let you know, I did work on VFS and optimizations of VFS end of
2006 for a month or two. It used to do stupid shit like calculating
path names each and every time you asked for it.
--
Bill Burke
JBoss, a division of Red Hat
http://bill.burkecentral.com