[JBoss JIRA] (WFCORE-3185) Run parallel boot tasks in coarser grained chunks

Tuesday, 23 April 2019

    [
https://issues.jboss.org/browse/WFCORE-3185?page=com.atlassian.jira.plugi...
] 

Brian Stansberry commented on WFCORE-3185:
------------------------------------------

It looks like the problem we were seeing with this in full was due to the fact that server
(as opposed to HC) boot will not work properly with a deployment if parallel boot is
disabled. This change just exposes it by making it easier to disable parallel boot, i.e.
by running with only a single CPU.  It's always been possible to turn it off though,
by setting system property org.jboss.server.bootstrap.maxThreads.  (Ancient history behind
that -- the property controls the size of the MSC pool and setting it was seen as an
indication that careful control of threads was wanted; e.g. in a constrained
environment.)

The problem is this:

https://github.com/wildfly/wildfly/blob/master/iiop-openjdk/src/main/java...

and then this:

https://github.com/wildfly/wildfly/blob/master/iiop-openjdk/src/main/java...

Those lines are *adding steps*. Which, with parallel boot get added to a special
ParallelBootOperationContext, which runs them as part of the set of ops for 1 subsystem
(before this fix) or a set of subsystems (with this PR).  Either way, they run before boot
proceeds past the subsystem ops and on to deployment ops.

But without parallel boot, those ops get added to the list of ops, and that means they
happen *after* ops like /deployment=foo:add run. Which means the deployment can be getting
installed before all the subsystem work is done. In this case the subsystem work is adding
a DUP that ensures the deployment has access to certain modules.

There's nothing wrong with what the subsystem is doing there. It's a very common
pattern to add steps to execute later.

There are two solutions I see here:

1) Execute ops after the subsystem ones separately; i.e. break boot up even further, from
the current two operations to three.
2) Tweak parallel boot so it basically always runs in a server, which is pretty simple. 
It's pretty clear boot hasn't worked properly without parallel boot turned on for
a long time.

...
 Run parallel boot tasks in coarser grained chunks
 -------------------------------------------------

                 Key: WFCORE-3185
                 URL: https://issues.jboss.org/browse/WFCORE-3185
             Project: WildFly Core
          Issue Type: Enhancement
          Components: Management
            Reporter: Brian Stansberry
            Assignee: Brian Stansberry
            Priority: Major

 Currently parallel boot works by executing one task per Extension.initializeParsers call,
one per Extension.initialize call and 2 per subsystem, one for execution of Stage.MODEL
ops and one for Stage.RUNTIME ops. The Extension.initializeParsers tasks complete before
boot proceeds to the point where any Extension.initialize tasks run, and the
Extension.initialize tasks complete before the Stage.MODEL tasks run. The Stage.MODEL
tasks do the large bulk of their work before the Stage.RUNTIME tasks run, but they do
block waiting for the Stage.RUNTIME tasks and the rest of the boot to complete.
 The rough effect of all this is we are allocating 2 threads per subsystem to do parallel
boot, and at various points we have 1 thread per subsystem concurrently working. For a
brief period (doing Stage.DONE of the post-extension boot op) we have 2 threads per
subsystem concurrently working.
 My measurements show that all of this concurrent work reduces boot time about 400ms on my
machine, using the full WildFly standalone-full.xml config. However, this approach uses a
lot of threads. So the task here is to look into how to get the same or better boot speed
while using fewer threads. (Note the threads will expire and be gc'd after boot.)
 The obvious way to do this is to look at each of the 4 task types discussed in the first
paragraph and group things into larger units of work than a single extension/subsystem.
 Initial work on doing this shows that using more coarse grained chunks does not result in
reduced boot time, but also seems not to increase boot time. Further measurement is needed
to confirm this though, and small tweaks may show different results.
 -Another thing to consider is allowing the Stage.MODEL tasks to complete without waiting
for the overall boot op to complete. This might reduce the max number of threads involved
and perhaps will allow a tiny bit more parallelization of work. The key here is ensuring
the Stage.MODEL tasks are not able to affect the state of the final system in an invalid
way. That could be problematic or fragile, so it's just something to consider, and if
done must be done with great care.-
 Even if this work produces no reduction in boot time, if it produces no increase there is
some value in incorporating it, as avoiding unnecessary thread creation improves the
impression of the efficiency and good design of the software.

 _In particular, with a default thread stack size of 1024K, allocating an extra 50+
threads at boot means the process will consume an extra 50MB of RSS beyond what it would
otherwise need. That memory should eventually be returned to the OS, and it's possible
that later use of the server will result in a peak memory use after boot that's higher
than what's needed at boot, but still, in a memory constrained environment (think
cloud with applications trying to live in a smaller memory budget), requesting an extra
50MB beyond what provides benefit is not immaterial._ 

--
This message was sent by Atlassian Jira
(v7.12.1#712002)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006