David M. Lloyd wrote:
On 01/05/2010 04:39 PM, Bill Burke wrote:
> David M. Lloyd wrote:
>> That's fine BUT a general-purpose caching system will just fuck things up
>> again.
> With 3.5 years of development you're telling me that our testsuite isn't
> good enough to pick up errors introduced by a "general-purpose caching
> system" or other optimizations?
Yup. The test suite has historically had a very difficult time locating
problems which have cropped up in the VFS2 implementation. I guarantee
that applying a general solution without understanding each use case will
result in crappy code and time-bomb style errors. You can't produce
bug-free code just by writing regression tests.
>> If you know the info won't change, then don't keep asking for it.
> Easier said than done.
>
> Take for instance VirtulFile.visit(). Any visitor that is run calls
> isDirectory(). For each isDirectory() there is at least one
> ConcurrentHashMap lookup to find the mount, then to delegate to the
> FileSystem interface which itself does hash lookups.
Can't be avoided. The topology of mounts may change at any time, which
means that one call to isDirectory() might give a different answer than the
next for the same file.
Come on, when would any VirtualFile that is a directory *EVER* become a
VirtualFile that is not a directory?
How can this possibly be cached when things can be
mounted and unmounted at will, by threads other than your own? A
VirtualFile is just a symbolic name after all - it does not and can not
represent any kind of link into the physical filesystem itself since such a
link cannot exist with anything resembling consistent semantics.
This is an app server not an operating system. Its pretty simple. You
invalidate the cache for the remounted/unmounted item.
It might be worthwhile to explore using some type of COW HashMap
setup
instead of CHM, since there are relatively few mounts compared to lookups.
In any case, O(1) + O(1) = O(1) so algorithmically it's sound. It's just a
question of optimization at this stage, if there are any which can be done
without breaking the API contract.
> Then, visit() continues with calling getChildren() if the file is a
> directory. getChildren() recreates a totally brand new VirtualFile
> list, by, yes, doing N more hash lookups for each child entry.
You're talking about the creation of a list of objects which have three
reference fields and one integer field. It's cheap. I already benchmarked
the piss out of it, and I'm sure it's not the source of any significant
performance issue unless you're doing something crazy.
So what was acceptable to you? My bet is your bar is much higher than
mine. I know we can get the default profile booting in 10 seconds or
less. ~200ms/5meg-jar for initializing a VirtualFile is just
unacceptable for VFS.
For example I just optimized VirtualFile.isDirectory(), isFile(), and
getMount() to cache (and invalidate the cache on a mount). In addition,
I modified VirtualFile.getChildren() to have the FileSystem create the
list of VirtualFiles. Doing so, allowed me to pre-initialize the
isDirectory cache of the VirtualFile. I just got a 12% improvement
(30ms). This improvement was entirely within determinePackageNames, but
still. 30ms * 3 (15 meg of jars) * 2 (Packages + Subdeploy scans) ==
200ms less of App Server boot time. Not much, but its a start.
That's rather disingenuous. VFS failed to get correctness right
after 3-4
years, hence it has been rewritten with correctness as the primary goal.
A complete rewrite will have the same consequences as any optimization.
If you want the feature, you pay the price. The requirements are always
open for debate, I suppose.
My goal is pretty simple. For AS not to suck. Paying drastically for
these features that VFS is supposed to enable doesn't get us anywhere,
especially when JBoss AS 4.x and earlier got the job done with FAR LESS
abstraction (and CPU).
--
Bill Burke
JBoss, a division of Red Hat
http://bill.burkecentral.com