]
James Perkins updated WFCORE-3185:
----------------------------------
Fix Version/s: (was: 9.0.0.Beta3)
Run parallel boot tasks in coarser grained chunks
-------------------------------------------------
Key: WFCORE-3185
URL:
https://issues.jboss.org/browse/WFCORE-3185
Project: WildFly Core
Issue Type: Enhancement
Components: Management
Reporter: Brian Stansberry
Assignee: Brian Stansberry
Priority: Major
Currently parallel boot works by executing one task per Extension.initializeParsers call,
one per Extension.initialize call and 2 per subsystem, one for execution of Stage.MODEL
ops and one for Stage.RUNTIME ops. The Extension.initializeParsers tasks complete before
boot proceeds to the point where any Extension.initialize tasks run, and the
Extension.initialize tasks complete before the Stage.MODEL tasks run. The Stage.MODEL
tasks do the large bulk of their work before the Stage.RUNTIME tasks run, but they do
block waiting for the Stage.RUNTIME tasks and the rest of the boot to complete.
The rough effect of all this is we are allocating 2 threads per subsystem to do parallel
boot, and at various points we have 1 thread per subsystem concurrently working. For a
brief period (doing Stage.DONE of the post-extension boot op) we have 2 threads per
subsystem concurrently working.
My measurements show that all of this concurrent work reduces boot time about 400ms on my
machine, using the full WildFly standalone-full.xml config. However, this approach uses a
lot of threads. So the task here is to look into how to get the same or better boot speed
while using fewer threads. (Note the threads will expire and be gc'd after boot.)
The obvious way to do this is to look at each of the 4 task types discussed in the first
paragraph and group things into larger units of work than a single extension/subsystem.
Initial work on doing this shows that using more coarse grained chunks does not result in
reduced boot time, but also seems not to increase boot time. Further measurement is needed
to confirm this though, and small tweaks may show different results.
-Another thing to consider is allowing the Stage.MODEL tasks to complete without waiting
for the overall boot op to complete. This might reduce the max number of threads involved
and perhaps will allow a tiny bit more parallelization of work. The key here is ensuring
the Stage.MODEL tasks are not able to affect the state of the final system in an invalid
way. That could be problematic or fragile, so it's just something to consider, and if
done must be done with great care.-
Even if this work produces no reduction in boot time, if it produces no increase there is
some value in incorporating it, as avoiding unnecessary thread creation improves the
impression of the efficiency and good design of the software.
_In particular, with a default thread stack size of 1024K, allocating an extra 50+
threads at boot means the process will consume an extra 50MB of RSS beyond what it would
otherwise need. That memory should eventually be returned to the OS, and it's possible
that later use of the server will result in a peak memory use after boot that's higher
than what's needed at boot, but still, in a memory constrained environment (think
cloud with applications trying to live in a smaller memory budget), requesting an extra
50MB beyond what provides benefit is not immaterial._