From the time I did parallel boot I’ve always wondered if the level of
concurrency was valid, but I never got around to doing any experimentation.
It’s quite naive; a task per extension module load and then one per subystem. I’ve wanted
to look into instead dividing the work into X larger tasks with X derived from the number
of cores.
But for your fix to be helping things so much it must be loading a lot of these classes
during the single-threaded parts of the boot, so I don’t see how my changing it to have
fewer tasks would compete with that. It may be beneficial regardless though, e.g. by not
spinning up more threads that can be efficiently used.
On May 15, 2017, at 4:52 PM, Stuart Douglas
<stuart.w.douglas(a)gmail.com> wrote:
On Tue, May 16, 2017 at 12:13 AM, Brian Stansberry <brian.stansberry(a)redhat.com>
wrote:
Definitely worth investigating. I’d like to have a real good understanding of why it has
the benefits it has, so we can see if this is the best way to get them or if something
else is better.
I am pretty sure it is contention related. I modified my hack to load all classes from
the same module at once (so once the first class from a module in that properties file is
reached, it loads all others from the same module), and this gave another small but
significant speedup (so the total gain is ~2.0-2.1s down from ~2.9s).
Looking at the results of monitor profiling in Yourkit it looks like the reason is
reduced contention. There is 50% less thread wait time on ModuleLoader$FutureModule,
contention on JarFileResourceLoader is no more. I think the reason is that we have a lot
of threads active at boot and this results in a lot of contention in module/class
loading.
Stuart
This kicks in just before the ModelController starts and begins parsing the config. The
config parsing quickly gets into parallel work; as soon as the extension elements are
reached the extension modules are loaded concurrently. Then once the parsing is done each
subsystem is installed concurrently, so lots of threads doing concurrent classloading.
So why does adding two more make such a big difference?
Is it that they gets lots of work done in that time when the regular boot thread is not
doing concurrent work, i.e. the parsing and the non-parallel bits of operation execution?
Is it that these threads are just chugging along doing classloading efficiently while the
parallel threads are running along inefficiently getting scheduled and unscheduled?
The latter doesn’t make sense to me as there’s no reason why these threads would be any
more efficient than the others.
- Brian
> On May 14, 2017, at 6:36 PM, Stuart Douglas <stuart.w.douglas(a)gmail.com>
wrote:
>
> When JIRA was being screwy on Friday I used the time to investigate an idea I have
had for a while about improving our boot time performance. According to Yourkit the
majority of our time is spent in class loading. It seems very unlikely that we will be
able to reduce the number of classes we load on boot (or at the very least it would be a
massive amount of work) so I investigated a different approach.
>
> I modified ModuleClassLoader to spit out the name and module of every class that is
loaded at boot time, and stored this in a properties file. I then created a simple Service
that starts immediately that uses two threads to eagerly load every class on this list (I
used two threads because that seemed to work well on my laptop, I think
Runtime.availableProcessors()/4 is probably the best amount, but that assumption would
need to be tested on different hardware).
>
> The idea behind this is that we know the classes will be used at some point, and we
generally do not fully utilise all CPU's during boot, so we can use the unused CPU to
pre load these classes so they are ready when they are actually required.
>
> Using this approach I saw the boot time for standalone.xml drop from ~2.9s to ~2.3s
on my laptop. The (super hacky) code I used to perform this test is at
https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b...
>
> I think these initial results are encouraging, and it is a big enough gain that I
think it is worth investigating further.
>
> Firstly it would be great if I could get others to try it out and see if they see
similar gains to boot time, it may be that the gain is very system dependent.
>
> Secondly if we do decide to do this there are two approach that we can use that I
can see:
>
> 1) A hard coded list of class names that we generate before a release (basically
what the hack already does), this is simplest, but does add a little bit of additional
work to the release process (although if it is missed it would be no big deal, as
ClassNotFoundException's would be suppressed, and if a few classes are missing the
performance impact is negligible as long as the majority of the list is correct).
>
> 2) Generate the list dynamically on first boot, and store it in the temp directory.
This would require the addition of a hook into JBoss Modules to generate the list, but is
the approach I would prefer (as first boot is always a bit slower anyway).
>
> Thoughts?
>
> Stuart
> _______________________________________________
> wildfly-dev mailing list
> wildfly-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/wildfly-dev
--
Brian Stansberry
Manager, Senior Principal Software Engineer
JBoss by Red Hat
--
Brian Stansberry
Manager, Senior Principal Software Engineer
JBoss by Red Hat