[wildfly-dev] Speeding up WildFly boot time

Brian Stansberry brian.stansberry at redhat.com
Mon May 15 18:16:09 EDT 2017


>From the time I did parallel boot I’ve always wondered if the level of concurrency was valid, but I never got around to doing any experimentation.

It’s quite naive; a task per extension module load and then one per subystem. I’ve wanted to look into instead dividing the work into X larger tasks with X derived from the number of cores.

But for your fix to be helping things so much it must be loading a lot of these classes during the single-threaded parts of the boot, so I don’t see how my changing it to have fewer tasks would compete with that. It may be beneficial regardless though, e.g. by not spinning up more threads that can be efficiently used.

> On May 15, 2017, at 4:52 PM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> 
> 
> 
> On Tue, May 16, 2017 at 12:13 AM, Brian Stansberry <brian.stansberry at redhat.com> wrote:
> Definitely worth investigating. I’d like to have a real good understanding of why it has the benefits it has, so we can see if this is the best way to get them or if something else is better.
> 
> I am pretty sure it is contention related. I modified my hack to load all classes from the same module at once (so once the first class from a module in that properties file is reached, it loads all others from the same module), and this gave another small but significant speedup (so the total gain is ~2.0-2.1s down from ~2.9s).
> 
> Looking at the results of monitor profiling in Yourkit it looks like the reason is reduced contention. There is 50% less thread wait time on ModuleLoader$FutureModule, contention on JarFileResourceLoader is no more. I think the reason is that we have a lot of threads active at boot and this results in a lot of contention in module/class loading.
> 
> Stuart
> 
> 
>  
> 
> This kicks in just before the ModelController starts and begins parsing the config. The config parsing quickly gets into parallel work; as soon as the extension elements are reached the extension modules are loaded concurrently. Then once the parsing is done each subsystem is installed concurrently, so lots of threads doing concurrent classloading.
> 
> So why does adding two more make such a big difference?
> 
> Is it that they gets lots of work done in that time when the regular boot thread is not doing concurrent work, i.e. the parsing and the non-parallel bits of operation execution?
> 
> Is it that these threads are just chugging along doing classloading efficiently while the parallel threads are running along inefficiently getting scheduled and unscheduled?
> 
> The latter doesn’t make sense to me as there’s no reason why these threads would be any more efficient than the others.
> 
> - Brian
> 
> > On May 14, 2017, at 6:36 PM, Stuart Douglas <stuart.w.douglas at gmail.com> wrote:
> >
> > When JIRA was being screwy on Friday I used the time to investigate an idea I have had for a while about improving our boot time performance. According to Yourkit the majority of our time is spent in class loading. It seems very unlikely that we will be able to reduce the number of classes we load on boot (or at the very least it would be a massive amount of work) so I investigated a different approach.
> >
> > I modified ModuleClassLoader to spit out the name and module of every class that is loaded at boot time, and stored this in a properties file. I then created a simple Service that starts immediately that uses two threads to eagerly load every class on this list (I used two threads because that seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is probably the best amount, but that assumption would need to be tested on different hardware).
> >
> > The idea behind this is that we know the classes will be used at some point, and we generally do not fully utilise all CPU's during boot, so we can use the unused CPU to pre load these classes so they are ready when they are actually required.
> >
> > Using this approach I saw the boot time for standalone.xml drop from ~2.9s to ~2.3s on my laptop. The (super hacky) code I used to perform this test is at https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:boot-performance-hack
> >
> > I think these initial results are encouraging, and it is a big enough gain that I think it is worth investigating further.
> >
> > Firstly it would be great if I could get others to try it out and see if they see similar gains to boot time, it may be that the gain is very system dependent.
> >
> > Secondly if we do decide to do this there are two approach that we can use that I can see:
> >
> > 1) A hard coded list of class names that we generate before a release (basically what the hack already does), this is simplest, but does add a little bit of additional work to the release process (although if it is missed it would be no big deal, as ClassNotFoundException's would be suppressed, and if a few classes are missing the performance impact is negligible as long as the majority of the list is correct).
> >
> > 2) Generate the list dynamically on first boot, and store it in the temp directory. This would require the addition of a hook into JBoss Modules to generate the list, but is the approach I would prefer (as first boot is always a bit slower anyway).
> >
> > Thoughts?
> >
> > Stuart
> > _______________________________________________
> > wildfly-dev mailing list
> > wildfly-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/wildfly-dev
> 
> --
> Brian Stansberry
> Manager, Senior Principal Software Engineer
> JBoss by Red Hat
> 
> 
> 
> 

-- 
Brian Stansberry
Manager, Senior Principal Software Engineer
JBoss by Red Hat






More information about the wildfly-dev mailing list