Speeding up WildFly boot time

JDK 9 EA Build 170 is available on...

Provisioning a server without some...

Stuart Douglas

Sunday, 14 May 2017 Sun, 14 May '17

6:36 p.m.

When JIRA was being screwy on Friday I used the time to investigate an idea I have had for a while about improving our boot time performance. According to Yourkit the majority of our time is spent in class loading. It seems very unlikely that we will be able to reduce the number of classes we load on boot (or at the very least it would be a massive amount of work) so I investigated a different approach. I modified ModuleClassLoader to spit out the name and module of every class that is loaded at boot time, and stored this in a properties file. I then created a simple Service that starts immediately that uses two threads to eagerly load every class on this list (I used two threads because that seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is probably the best amount, but that assumption would need to be tested on different hardware). The idea behind this is that we know the classes will be used at some point, and we generally do not fully utilise all CPU's during boot, so we can use the unused CPU to pre load these classes so they are ready when they are actually required. Using this approach I saw the boot time for standalone.xml drop from ~2.9s to ~2.3s on my laptop. The (super hacky) code I used to perform this test is at https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b... I think these initial results are encouraging, and it is a big enough gain that I think it is worth investigating further. Firstly it would be great if I could get others to try it out and see if they see similar gains to boot time, it may be that the gain is very system dependent. Secondly if we do decide to do this there are two approach that we can use that I can see: 1) A hard coded list of class names that we generate before a release (basically what the hack already does), this is simplest, but does add a little bit of additional work to the release process (although if it is missed it would be no big deal, as ClassNotFoundException's would be suppressed, and if a few classes are missing the performance impact is negligible as long as the majority of the list is correct). 2) Generate the list dynamically on first boot, and store it in the temp directory. This would require the addition of a hook into JBoss Modules to generate the list, but is the approach I would prefer (as first boot is always a bit slower anyway). Thoughts? Stuart

Attachments:

attachment.html (text/html — 2.7 KB)

Show replies by date

Tomaž Cerar

Monday, 15 May Mon, 15 May

7:09 a.m.

Hey Stuart, this exact problem we discussed some time ago with David but didn't go as far as implementing a prototype. At the time one of bigger contention bottleneck was in jdk classes (java.lang.*, java.util.*, etc) and I think we should do something similar to what you did also in jboss-modules to address speeding up this. It would probably yield even better results in end run. -- tomaz On Mon, May 15, 2017 at 1:36 AM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote:

...

When JIRA was being screwy on Friday I used the time to investigate an idea I have had for a while about improving our boot time performance. According to Yourkit the majority of our time is spent in class loading. It seems very unlikely that we will be able to reduce the number of classes we load on boot (or at the very least it would be a massive amount of work) so I investigated a different approach. I modified ModuleClassLoader to spit out the name and module of every class that is loaded at boot time, and stored this in a properties file. I then created a simple Service that starts immediately that uses two threads to eagerly load every class on this list (I used two threads because that seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is probably the best amount, but that assumption would need to be tested on different hardware). The idea behind this is that we know the classes will be used at some point, and we generally do not fully utilise all CPU's during boot, so we can use the unused CPU to pre load these classes so they are ready when they are actually required. Using this approach I saw the boot time for standalone.xml drop from ~2.9s to ~2.3s on my laptop. The (super hacky) code I used to perform this test is at https://github.com/wildfly/wildfly-core/compare/master... stuartwdouglas:boot-performance-hack I think these initial results are encouraging, and it is a big enough gain that I think it is worth investigating further. Firstly it would be great if I could get others to try it out and see if they see similar gains to boot time, it may be that the gain is very system dependent. Secondly if we do decide to do this there are two approach that we can use that I can see: 1) A hard coded list of class names that we generate before a release (basically what the hack already does), this is simplest, but does add a little bit of additional work to the release process (although if it is missed it would be no big deal, as ClassNotFoundException's would be suppressed, and if a few classes are missing the performance impact is negligible as long as the majority of the list is correct). 2) Generate the list dynamically on first boot, and store it in the temp directory. This would require the addition of a hook into JBoss Modules to generate the list, but is the approach I would prefer (as first boot is always a bit slower anyway). Thoughts? Stuart _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Rostislav Svoboda

7:27 a.m.

...

1) A hard coded list of class names that we generate before a release

This will improve first boot impression, little bit harder for maintaining the list for the final build. Property files could be located inside properties directory of dedicated module (<resource-root path="properties"/>). Properties directory could contain property files for delivered profiles. Layered products or customer modifications could deliver own property file. e.g. predefined property file for standalone-openshift.xml in EAP image in OpenShift environment, I think they boot the server just once and throw away the whole docker image when something changes.

...

2) Generate the list dynamically on first boot, and store it in the temp

This looks like the most elegant thing to do. Question is how it will slow down the initial boot. People care about first boot impression, some blog writers do the mistake too. This would also block boot time improvements for use-cases when you start the server just once - e.g. Docker, OpenShift. Also the logic should take into account which profile is loaded - e.g standalone.xml vs. standalone-full-ha.xml Rostislav [1] rm wildfly-11.0.0.Beta1-SNAPSHOT-preload/standalone/log/server.log rm wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log for i in {1..50}; do echo $i wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/standalone.sh 1>/dev/null 2>&1 & sleep 8 wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 done grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT-preload/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' for i in {1..50}; do echo $i wildfly-11.0.0.Beta1-SNAPSHOT/bin/standalone.sh 1>/dev/null 2>&1 & sleep 8 wildfly-11.0.0.Beta1-SNAPSHOT/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 done grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' ----- Original Message ----- > When JIRA was being screwy on Friday I used the time to investigate an idea I > have had for a while about improving our boot time performance. According to > Yourkit the majority of our time is spent in class loading. It seems very > unlikely that we will be able to reduce the number of classes we load on > boot (or at the very least it would be a massive amount of work) so I > investigated a different approach. > > I modified ModuleClassLoader to spit out the name and module of every class > that is loaded at boot time, and stored this in a properties file. I then > created a simple Service that starts immediately that uses two threads to > eagerly load every class on this list (I used two threads because that > seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is > probably the best amount, but that assumption would need to be tested on > different hardware). > > The idea behind this is that we know the classes will be used at some point, > and we generally do not fully utilise all CPU's during boot, so we can use > the unused CPU to pre load these classes so they are ready when they are > actually required. > > Using this approach I saw the boot time for standalone.xml drop from ~2.9s to > ~2.3s on my laptop. The (super hacky) code I used to perform this test is at > https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b... > > I think these initial results are encouraging, and it is a big enough gain > that I think it is worth investigating further. > > Firstly it would be great if I could get others to try it out and see if they > see similar gains to boot time, it may be that the gain is very system > dependent. > > Secondly if we do decide to do this there are two approach that we can use > that I can see: >

...

1) A hard coded list of class names that we generate before a release

> (basically what the hack already does), this is simplest, but does add a > little bit of additional work to the release process (although if it is > missed it would be no big deal, as ClassNotFoundException's would be > suppressed, and if a few classes are missing the performance impact is > negligible as long as the majority of the list is correct). >

...

2) Generate the list dynamically on first boot, and store it in the temp

> directory. This would require the addition of a hook into JBoss Modules to > generate the list, but is the approach I would prefer (as first boot is > always a bit slower anyway). > > Thoughts? > > Stuart > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev

Brian Stansberry

9:21 a.m.

A disadvantage of a static list is that we have other concerns besides boot speed, i.e. memory footprint. We do not want to be loading a bunch of classes that are not relevant to the configuration. You touch on this issue, Rostislav, in your point about properties file for delivered profiles. If the lists are per feature-pack (or better yet per package once our feature packs have the package notion) and then the build integrates them, that somewhat mitigates that concern, at least for people who are careful about how they provision. But it doesn’t help people who are not so careful and just rely on standalone.xml trimming or selecting a reasonalbe standarc config to tailor their server footprint.

...

On May 15, 2017, at 7:27 AM, Rostislav Svoboda <rsvoboda(a)redhat.com> wrote: Hi. I can confirm I see improvements in boot time with your changes. My HW is Lenovo T440s with Fedora 25, Intel(R) Core(TM) i7-4600U CPU (Base Frequency 2.10 GHz, Max Turbo 3.30 GHz) I executed 50 iterations of start - stop sequence [1], before execution 5x start - stop for "warmup" With your changes Min: 3116 Max: 3761 Average: 3247.640000 Without: Min: 3442 Max: 4081 Average: 3580.840000 > 1) A hard coded list of class names that we generate before a release This will improve first boot impression, little bit harder for maintaining the list for the final build. Property files could be located inside properties directory of dedicated module (<resource-root path="properties"/>). Properties directory could contain property files for delivered profiles. Layered products or customer modifications could deliver own property file. e.g. predefined property file for standalone-openshift.xml in EAP image in OpenShift environment, I think they boot the server just once and throw away the whole docker image when something changes. > 2) Generate the list dynamically on first boot, and store it in the temp This looks like the most elegant thing to do. Question is how it will slow down the initial boot. People care about first boot impression, some blog writers do the mistake too. This would also block boot time improvements for use-cases when you start the server just once - e.g. Docker, OpenShift. Also the logic should take into account which profile is loaded - e.g standalone.xml vs. standalone-full-ha.xml Rostislav [1] rm wildfly-11.0.0.Beta1-SNAPSHOT-preload/standalone/log/server.log rm wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log for i in {1..50}; do echo $i wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/standalone.sh 1>/dev/null 2>&1 & sleep 8 wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 done grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT-preload/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' for i in {1..50}; do echo $i wildfly-11.0.0.Beta1-SNAPSHOT/bin/standalone.sh 1>/dev/null 2>&1 & sleep 8 wildfly-11.0.0.Beta1-SNAPSHOT/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 done grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' ----- Original Message ----- > When JIRA was being screwy on Friday I used the time to investigate an idea I > have had for a while about improving our boot time performance. According to > Yourkit the majority of our time is spent in class loading. It seems very > unlikely that we will be able to reduce the number of classes we load on > boot (or at the very least it would be a massive amount of work) so I > investigated a different approach. > > I modified ModuleClassLoader to spit out the name and module of every class > that is loaded at boot time, and stored this in a properties file. I then > created a simple Service that starts immediately that uses two threads to > eagerly load every class on this list (I used two threads because that > seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is > probably the best amount, but that assumption would need to be tested on > different hardware). > > The idea behind this is that we know the classes will be used at some point, > and we generally do not fully utilise all CPU's during boot, so we can use > the unused CPU to pre load these classes so they are ready when they are > actually required. > > Using this approach I saw the boot time for standalone.xml drop from ~2.9s to > ~2.3s on my laptop. The (super hacky) code I used to perform this test is at > https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b... > > I think these initial results are encouraging, and it is a big enough gain > that I think it is worth investigating further. > > Firstly it would be great if I could get others to try it out and see if they > see similar gains to boot time, it may be that the gain is very system > dependent. > > Secondly if we do decide to do this there are two approach that we can use > that I can see: > > 1) A hard coded list of class names that we generate before a release > (basically what the hack already does), this is simplest, but does add a > little bit of additional work to the release process (although if it is > missed it would be no big deal, as ClassNotFoundException's would be > suppressed, and if a few classes are missing the performance impact is > negligible as long as the majority of the list is correct). > > 2) Generate the list dynamically on first boot, and store it in the temp > directory. This would require the addition of a hook into JBoss Modules to > generate the list, but is the approach I would prefer (as first boot is > always a bit slower anyway). > > Thoughts? > > Stuart > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

-- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat

Sanne Grinovero

9:23 a.m.

Very interesting.

...

From a different perspective, but closely related, I was recently

trying to profile the testsuites of Hibernate projects and also in our case ClassLoader time is significant portion of the bootstrap time. In the Hibernate case I noticed it spends quite some time to locate Service implementations over the ServiceLoader pattern. The problem seems to be that a lot of internal code has been refactored in recent versions to be "replaceable" so Hibernate ORM includes a default implementation for each internal Service but it will first check if it can find an alternative somewhere else on the classpath: looking both among its own dependencies and among the classes provided by the deployment. I'll see if we can do better in Hibernate ORM (not sure yet!), but raising it here as I suspect several other libraries could be guilty of the same approach. I also hope we'll be able to curate (trim) the dependencies more; the current JPA subsystem is including many dependencies of questionable usefulness. That should help? Thanks, Sanne On 15 May 2017 at 13:27, Rostislav Svoboda <rsvoboda(a)redhat.com> wrote:

...

Hi. I can confirm I see improvements in boot time with your changes. My HW is Lenovo T440s with Fedora 25, Intel(R) Core(TM) i7-4600U CPU (Base Frequency 2.10 GHz, Max Turbo 3.30 GHz) I executed 50 iterations of start - stop sequence [1], before execution 5x start - stop for "warmup" With your changes Min: 3116 Max: 3761 Average: 3247.640000 Without: Min: 3442 Max: 4081 Average: 3580.840000 > 1) A hard coded list of class names that we generate before a release This will improve first boot impression, little bit harder for maintaining the list for the final build. Property files could be located inside properties directory of dedicated module (<resource-root path="properties"/>). Properties directory could contain property files for delivered profiles. Layered products or customer modifications could deliver own property file. e.g. predefined property file for standalone-openshift.xml in EAP image in OpenShift environment, I think they boot the server just once and throw away the whole docker image when something changes. > 2) Generate the list dynamically on first boot, and store it in the temp This looks like the most elegant thing to do. Question is how it will slow down the initial boot. People care about first boot impression, some blog writers do the mistake too. This would also block boot time improvements for use-cases when you start the server just once - e.g. Docker, OpenShift. Also the logic should take into account which profile is loaded - e.g standalone.xml vs. standalone-full-ha.xml Rostislav [1] rm wildfly-11.0.0.Beta1-SNAPSHOT-preload/standalone/log/server.log rm wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log for i in {1..50}; do echo $i wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/standalone.sh 1>/dev/null 2>&1 & sleep 8 wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 done grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT-preload/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' for i in {1..50}; do echo $i wildfly-11.0.0.Beta1-SNAPSHOT/bin/standalone.sh 1>/dev/null 2>&1 & sleep 8 wildfly-11.0.0.Beta1-SNAPSHOT/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 done grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' ----- Original Message ----- > When JIRA was being screwy on Friday I used the time to investigate an idea I > have had for a while about improving our boot time performance. According to > Yourkit the majority of our time is spent in class loading. It seems very > unlikely that we will be able to reduce the number of classes we load on > boot (or at the very least it would be a massive amount of work) so I > investigated a different approach. > > I modified ModuleClassLoader to spit out the name and module of every class > that is loaded at boot time, and stored this in a properties file. I then > created a simple Service that starts immediately that uses two threads to > eagerly load every class on this list (I used two threads because that > seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is > probably the best amount, but that assumption would need to be tested on > different hardware). > > The idea behind this is that we know the classes will be used at some point, > and we generally do not fully utilise all CPU's during boot, so we can use > the unused CPU to pre load these classes so they are ready when they are > actually required. > > Using this approach I saw the boot time for standalone.xml drop from ~2.9s to > ~2.3s on my laptop. The (super hacky) code I used to perform this test is at > https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b... > > I think these initial results are encouraging, and it is a big enough gain > that I think it is worth investigating further. > > Firstly it would be great if I could get others to try it out and see if they > see similar gains to boot time, it may be that the gain is very system > dependent. > > Secondly if we do decide to do this there are two approach that we can use > that I can see: > > 1) A hard coded list of class names that we generate before a release > (basically what the hack already does), this is simplest, but does add a > little bit of additional work to the release process (although if it is > missed it would be no big deal, as ClassNotFoundException's would be > suppressed, and if a few classes are missing the performance impact is > negligible as long as the majority of the list is correct). > > 2) Generate the list dynamically on first boot, and store it in the temp > directory. This would require the addition of a hook into JBoss Modules to > generate the list, but is the approach I would prefer (as first boot is > always a bit slower anyway). > > Thoughts? > > Stuart > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Brian Stansberry

9:53 a.m.

+1 re: being careful about ServiceLoader. We were doing some perf testing work last week and that's one thing that showed up. Unfortunately it was in the app we were testing rather than in the server code but I could easily imagine similar things happening in the server. In case people are curious, the issue is the javax.json.Json class, which provides a bunch of static utility methods to create JSON related objects. The problem is they are implemented via "JsonProvider.provider().doXXX”. And that JsonProvider.provider() call uses a ServiceLoader to try and find any custom JsonProvider impls before falling back using the default. WildFly doesn’t ship any such impls. Those ServiceLoader calls all result in a FileNotFoundException as the classloader checks for META-INF/services/…JsonProvider. So IO access plus the cost of creating exception. The solution is to not use javax.json.Json.xxx all the time in the app, but to call JsonProvider.provider() once and cache it.

...

On May 15, 2017, at 9:23 AM, Sanne Grinovero <sanne(a)hibernate.org> wrote: Very interesting. > From a different perspective, but closely related, I was recently trying to profile the testsuites of Hibernate projects and also in our case ClassLoader time is significant portion of the bootstrap time. In the Hibernate case I noticed it spends quite some time to locate Service implementations over the ServiceLoader pattern. The problem seems to be that a lot of internal code has been refactored in recent versions to be "replaceable" so Hibernate ORM includes a default implementation for each internal Service but it will first check if it can find an alternative somewhere else on the classpath: looking both among its own dependencies and among the classes provided by the deployment. I'll see if we can do better in Hibernate ORM (not sure yet!), but raising it here as I suspect several other libraries could be guilty of the same approach. I also hope we'll be able to curate (trim) the dependencies more; the current JPA subsystem is including many dependencies of questionable usefulness. That should help? Thanks, Sanne On 15 May 2017 at 13:27, Rostislav Svoboda <rsvoboda(a)redhat.com> wrote: > Hi. > > I can confirm I see improvements in boot time with your changes. > My HW is Lenovo T440s with Fedora 25, Intel(R) Core(TM) i7-4600U CPU (Base Frequency 2.10 GHz, Max Turbo 3.30 GHz) > > I executed 50 iterations of start - stop sequence [1], before execution 5x start - stop for "warmup" > > With your changes > Min: 3116 Max: 3761 Average: 3247.640000 > > Without: > Min: 3442 Max: 4081 Average: 3580.840000 > > >> 1) A hard coded list of class names that we generate before a release > > This will improve first boot impression, little bit harder for maintaining the list for the final build. > > Property files could be located inside properties directory of dedicated module (<resource-root path="properties"/>). Properties directory could contain property files for delivered profiles. > > Layered products or customer modifications could deliver own property file. > e.g. predefined property file for standalone-openshift.xml in EAP image in OpenShift environment, I think they boot the server just once and throw away the whole docker image when something changes. > > >> 2) Generate the list dynamically on first boot, and store it in the temp > > This looks like the most elegant thing to do. Question is how it will slow down the initial boot. People care about first boot impression, some blog writers do the mistake too. > This would also block boot time improvements for use-cases when you start the server just once - e.g. Docker, OpenShift. > > Also the logic should take into account which profile is loaded - e.g standalone.xml vs. standalone-full-ha.xml > > Rostislav > > [1] > rm wildfly-11.0.0.Beta1-SNAPSHOT-preload/standalone/log/server.log > rm wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log > > for i in {1..50}; do > echo $i > wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/standalone.sh 1>/dev/null 2>&1 & > sleep 8 > wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 > done > grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT-preload/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } > { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} > END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' > > > for i in {1..50}; do > echo $i > wildfly-11.0.0.Beta1-SNAPSHOT/bin/standalone.sh 1>/dev/null 2>&1 & > sleep 8 > wildfly-11.0.0.Beta1-SNAPSHOT/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 > done > grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } > { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} > END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' > > > ----- Original Message ----- >> When JIRA was being screwy on Friday I used the time to investigate an idea I >> have had for a while about improving our boot time performance. According to >> Yourkit the majority of our time is spent in class loading. It seems very >> unlikely that we will be able to reduce the number of classes we load on >> boot (or at the very least it would be a massive amount of work) so I >> investigated a different approach. >> >> I modified ModuleClassLoader to spit out the name and module of every class >> that is loaded at boot time, and stored this in a properties file. I then >> created a simple Service that starts immediately that uses two threads to >> eagerly load every class on this list (I used two threads because that >> seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is >> probably the best amount, but that assumption would need to be tested on >> different hardware). >> >> The idea behind this is that we know the classes will be used at some point, >> and we generally do not fully utilise all CPU's during boot, so we can use >> the unused CPU to pre load these classes so they are ready when they are >> actually required. >> >> Using this approach I saw the boot time for standalone.xml drop from ~2.9s to >> ~2.3s on my laptop. The (super hacky) code I used to perform this test is at >> https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b... >> >> I think these initial results are encouraging, and it is a big enough gain >> that I think it is worth investigating further. >> >> Firstly it would be great if I could get others to try it out and see if they >> see similar gains to boot time, it may be that the gain is very system >> dependent. >> >> Secondly if we do decide to do this there are two approach that we can use >> that I can see: >> >> 1) A hard coded list of class names that we generate before a release >> (basically what the hack already does), this is simplest, but does add a >> little bit of additional work to the release process (although if it is >> missed it would be no big deal, as ClassNotFoundException's would be >> suppressed, and if a few classes are missing the performance impact is >> negligible as long as the majority of the list is correct). >> >> 2) Generate the list dynamically on first boot, and store it in the temp >> directory. This would require the addition of a hook into JBoss Modules to >> generate the list, but is the approach I would prefer (as first boot is >> always a bit slower anyway). >> >> Thoughts? >> >> Stuart >> >> _______________________________________________ >> wildfly-dev mailing list >> wildfly-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/wildfly-dev > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

-- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat

Stuart Douglas

5:15 p.m.

On Mon, May 15, 2017 at 10:27 PM, Rostislav Svoboda <rsvoboda(a)redhat.com> wrote:

...

It will not actually slow down the initial boot (at least not in a measurable way), but the first boot would not get the benefit of this optimisation. Stuart

...

This would also block boot time improvements for use-cases when you start the server just once - e.g. Docker, OpenShift. Also the logic should take into account which profile is loaded - e.g standalone.xml vs. standalone-full-ha.xml Rostislav [1] rm wildfly-11.0.0.Beta1-SNAPSHOT-preload/standalone/log/server.log rm wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log for i in {1..50}; do echo $i wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/standalone.sh 1>/dev/null 2>&1 & sleep 8 wildfly-11.0.0.Beta1-SNAPSHOT-preload/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 done grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT- preload/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' for i in {1..50}; do echo $i wildfly-11.0.0.Beta1-SNAPSHOT/bin/standalone.sh 1>/dev/null 2>&1 & sleep 8 wildfly-11.0.0.Beta1-SNAPSHOT/bin/jboss-cli.sh -c :shutdown 1>/dev/null 2>&1 done grep WFLYSRV0025 wildfly-11.0.0.Beta1-SNAPSHOT/standalone/log/server.log | sed "s/.*$....$ms.*/\1/g" | awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %d\tMax: %d\tAverage: %f\n", min, max, sum/NR}' ----- Original Message ----- > When JIRA was being screwy on Friday I used the time to investigate an idea I > have had for a while about improving our boot time performance. According to > Yourkit the majority of our time is spent in class loading. It seems very > unlikely that we will be able to reduce the number of classes we load on > boot (or at the very least it would be a massive amount of work) so I > investigated a different approach. > > I modified ModuleClassLoader to spit out the name and module of every class > that is loaded at boot time, and stored this in a properties file. I then > created a simple Service that starts immediately that uses two threads to > eagerly load every class on this list (I used two threads because that > seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is > probably the best amount, but that assumption would need to be tested on > different hardware). > > The idea behind this is that we know the classes will be used at some point, > and we generally do not fully utilise all CPU's during boot, so we can use > the unused CPU to pre load these classes so they are ready when they are > actually required. > > Using this approach I saw the boot time for standalone.xml drop from ~2.9s to > ~2.3s on my laptop. The (super hacky) code I used to perform this test is at > https://github.com/wildfly/wildfly-core/compare/master... stuartwdouglas:boot-performance-hack > > I think these initial results are encouraging, and it is a big enough gain > that I think it is worth investigating further. > > Firstly it would be great if I could get others to try it out and see if they > see similar gains to boot time, it may be that the gain is very system > dependent. > > Secondly if we do decide to do this there are two approach that we can use > that I can see: > > 1) A hard coded list of class names that we generate before a release > (basically what the hack already does), this is simplest, but does add a > little bit of additional work to the release process (although if it is > missed it would be no big deal, as ClassNotFoundException's would be > suppressed, and if a few classes are missing the performance impact is > negligible as long as the majority of the list is correct). > > 2) Generate the list dynamically on first boot, and store it in the temp > directory. This would require the addition of a hook into JBoss Modules to > generate the list, but is the approach I would prefer (as first boot is > always a bit slower anyway). > > Thoughts? > > Stuart > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev

Heiko W.Rupp

10:57 p.m.

On 16 May 2017, at 0:15, Stuart Douglas wrote:

...

>> 2) Generate the list dynamically on first boot, and store it in the temp > > This looks like the most elegant thing to do. Question is how it will slow > down the initial boot. People care about first boot impression, some blog > writers do the mistake too. > It will not actually slow down the initial boot (at least not in a measurable way), but the first boot would not get the benefit of this optimisation.

A mixed mode could be interesting, where the list is created by tooling (or pseudo-boot) and then written down. As someone said on Docker/OS every boot is first boot, so doing the pre-population then would not help. But creating the list at image creation time would dynamically create the list and make the speedup available to all starts of containers from that image.

Bob McWhirter

Tuesday, 16 May Tue, 16 May

6:54 p.m.

...

From a swarm perspective I'd like something that benefits first boot

because we have no place to store stuff for second boot. It's all first boots! Bob On Mon, May 15, 2017 at 8:29 AM Rostislav Svoboda <rsvoboda(a)redhat.com> wrote:

...

Brian Stansberry

Monday, 15 May Mon, 15 May

9:13 a.m.

Definitely worth investigating. I’d like to have a real good understanding of why it has the benefits it has, so we can see if this is the best way to get them or if something else is better. This kicks in just before the ModelController starts and begins parsing the config. The config parsing quickly gets into parallel work; as soon as the extension elements are reached the extension modules are loaded concurrently. Then once the parsing is done each subsystem is installed concurrently, so lots of threads doing concurrent classloading. So why does adding two more make such a big difference? Is it that they gets lots of work done in that time when the regular boot thread is not doing concurrent work, i.e. the parsing and the non-parallel bits of operation execution? Is it that these threads are just chugging along doing classloading efficiently while the parallel threads are running along inefficiently getting scheduled and unscheduled? The latter doesn’t make sense to me as there’s no reason why these threads would be any more efficient than the others. - Brian

...

On May 14, 2017, at 6:36 PM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote: When JIRA was being screwy on Friday I used the time to investigate an idea I have had for a while about improving our boot time performance. According to Yourkit the majority of our time is spent in class loading. It seems very unlikely that we will be able to reduce the number of classes we load on boot (or at the very least it would be a massive amount of work) so I investigated a different approach. I modified ModuleClassLoader to spit out the name and module of every class that is loaded at boot time, and stored this in a properties file. I then created a simple Service that starts immediately that uses two threads to eagerly load every class on this list (I used two threads because that seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is probably the best amount, but that assumption would need to be tested on different hardware). The idea behind this is that we know the classes will be used at some point, and we generally do not fully utilise all CPU's during boot, so we can use the unused CPU to pre load these classes so they are ready when they are actually required. Using this approach I saw the boot time for standalone.xml drop from ~2.9s to ~2.3s on my laptop. The (super hacky) code I used to perform this test is at https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b... I think these initial results are encouraging, and it is a big enough gain that I think it is worth investigating further. Firstly it would be great if I could get others to try it out and see if they see similar gains to boot time, it may be that the gain is very system dependent. Secondly if we do decide to do this there are two approach that we can use that I can see: 1) A hard coded list of class names that we generate before a release (basically what the hack already does), this is simplest, but does add a little bit of additional work to the release process (although if it is missed it would be no big deal, as ClassNotFoundException's would be suppressed, and if a few classes are missing the performance impact is negligible as long as the majority of the list is correct). 2) Generate the list dynamically on first boot, and store it in the temp directory. This would require the addition of a hook into JBoss Modules to generate the list, but is the approach I would prefer (as first boot is always a bit slower anyway). Thoughts? Stuart _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

-- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat

Tomaž Cerar

10:04 a.m.

On Mon, May 15, 2017 at 4:13 PM, Brian Stansberry < brian.stansberry(a)redhat.com> wrote:

...

So why does adding two more make such a big difference?

Main reason is that this two threads load most of later required classes which can later be quickly loaded from multiple parallel threads. Currently concurrency causes that 8 -16 threads (on 4-8 logical core systems) try to load same classes at same time. this leads to lots of contention as result. "preloading" some of this classes reduces contention. Looking at the list in the current "hack impl" there are lots of classes that don't need to be there, stuff like subsystem parsers which are only loaded once in any case. Main pressure is on classes from jboss-modules, controller, server & xml parsers modules, all others are not as problematic. This is also reason why lots of contention is happening on JDK classes as well as those are shared between all parts of server code.

Brian Stansberry

11:20 a.m.

Thanks. That’s interesting.

...

On May 15, 2017, at 10:04 AM, Tomaž Cerar <tomaz.cerar(a)gmail.com> wrote: On Mon, May 15, 2017 at 4:13 PM, Brian Stansberry <brian.stansberry(a)redhat.com> wrote: So why does adding two more make such a big difference? Main reason is that this two threads load most of later required classes which can later be quickly loaded from multiple parallel threads. Currently concurrency causes that 8 -16 threads (on 4-8 logical core systems) try to load same classes at same time. this leads to lots of contention as result. "preloading" some of this classes reduces contention. Looking at the list in the current "hack impl" there are lots of classes that don't need to be there, stuff like subsystem parsers which are only loaded once in any case. Main pressure is on classes from jboss-modules, controller, server & xml parsers modules, all others are not as problematic. This is also reason why lots of contention is happening on JDK classes as well as those are shared between all parts of server code.

-- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat

Brian Stansberry

Wednesday, 17 May Wed, 17 May

1:42 p.m.

...

Stuart/Tomaz — Please ignore this for now if your thinking has moved on to other approaches, e.g. better concurrency in classloading. :) Otherwise, are there any numbers on this last point Tomaz made? I ask because people are asking for a static list since a dynamic list is of no benefit to cloud use cases. A static list is painful to administer though, and if not administered well can result in loading unneeded classes and wasting memory. But, a static list limited to modules that are part of the WildFly Core kernel is not particularly hard to administer. So if we can get the bulk of the gains with the minimum of the pain, we might consider that. -- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat

Jason Greene

3:29 p.m.

...

On May 17, 2017, at 1:42 PM, Brian Stansberry <brian.stansberry(a)redhat.com> wrote: > On May 15, 2017, at 10:04 AM, Tomaž Cerar <tomaz.cerar(a)gmail.com> wrote: > > > On Mon, May 15, 2017 at 4:13 PM, Brian Stansberry <brian.stansberry(a)redhat.com> wrote: > So why does adding two more make such a big difference? > > Main reason is that this two threads load most of later required classes which can later be quickly loaded from multiple parallel threads. > > Currently concurrency causes that 8 -16 threads (on 4-8 logical core systems) try to load same classes at same time. > this leads to lots of contention as result. "preloading" some of this classes reduces contention. > > Looking at the list in the current "hack impl" there are lots of classes that don't need to be there, stuff like subsystem parsers which are only loaded once in any case. > > Main pressure is on classes from jboss-modules, controller, server & xml parsers modules, all others are not as problematic. > This is also reason why lots of contention is happening on JDK classes as well as those are shared between all parts of server code. > Stuart/Tomaz — Please ignore this for now if your thinking has moved on to other approaches, e.g. better concurrency in classloading. :) Otherwise, are there any numbers on this last point Tomaz made? I ask because people are asking for a static list since a dynamic list is of no benefit to cloud use cases. A static list is painful to administer though, and if not administered well can result in loading unneeded classes and wasting memory. But, a static list limited to modules that are part of the WildFly Core kernel is not particularly hard to administer. So if we can get the bulk of the gains with the minimum of the pain, we might consider that.

We can also just have a dynamic offline list generation, which is ran as a build task. -- Jason T. Greene WildFly Lead / JBoss EAP Platform Architect JBoss, a division of Red Hat

Brian Stansberry

3:45 p.m.

...

On May 17, 2017, at 3:29 PM, Jason Greene <jason.greene(a)redhat.com> wrote: > On May 17, 2017, at 1:42 PM, Brian Stansberry <brian.stansberry(a)redhat.com> wrote: > > >> On May 15, 2017, at 10:04 AM, Tomaž Cerar <tomaz.cerar(a)gmail.com> wrote: >> >> >> On Mon, May 15, 2017 at 4:13 PM, Brian Stansberry <brian.stansberry(a)redhat.com> wrote: >> So why does adding two more make such a big difference? >> >> Main reason is that this two threads load most of later required classes which can later be quickly loaded from multiple parallel threads. >> >> Currently concurrency causes that 8 -16 threads (on 4-8 logical core systems) try to load same classes at same time. >> this leads to lots of contention as result. "preloading" some of this classes reduces contention. >> >> Looking at the list in the current "hack impl" there are lots of classes that don't need to be there, stuff like subsystem parsers which are only loaded once in any case. >> >> Main pressure is on classes from jboss-modules, controller, server & xml parsers modules, all others are not as problematic. >> This is also reason why lots of contention is happening on JDK classes as well as those are shared between all parts of server code. >> > > Stuart/Tomaz — > > Please ignore this for now if your thinking has moved on to other approaches, e.g. better concurrency in classloading. :) > > Otherwise, are there any numbers on this last point Tomaz made? > > I ask because people are asking for a static list since a dynamic list is of no benefit to cloud use cases. > > A static list is painful to administer though, and if not administered well can result in loading unneeded classes and wasting memory. > > But, a static list limited to modules that are part of the WildFly Core kernel is not particularly hard to administer. So if we can get the bulk of the gains with the minimum of the pain, we might consider that. > We can also just have a dynamic offline list generation, which is ran as a build task.

Yes, that’s my assumption. When I say “static” I mean static on a given installation. If it is limited to the kernel (including relevant JDK bits), then there are no issues with ensuring different feature pack maintainers are doing this, no need to combine lists from different parts of the build, no worries about ensuring only those bits relevant to what the user is actually running are loaded, etc. Those things are the “painful to administer part”. They might very well be worth it but data should demonstrate that. -- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat

Kabir Khan

Thursday, 18 May Thu, 18 May

3:28 a.m.

...

On 17 May 2017, at 22:45, Brian Stansberry <brian.stansberry(a)redhat.com> wrote: > On May 17, 2017, at 3:29 PM, Jason Greene <jason.greene(a)redhat.com> wrote: > > >> On May 17, 2017, at 1:42 PM, Brian Stansberry <brian.stansberry(a)redhat.com> wrote: >> >> >>> On May 15, 2017, at 10:04 AM, Tomaž Cerar <tomaz.cerar(a)gmail.com> wrote: >>> >>> >>> On Mon, May 15, 2017 at 4:13 PM, Brian Stansberry <brian.stansberry(a)redhat.com> wrote: >>> So why does adding two more make such a big difference? >>> >>> Main reason is that this two threads load most of later required classes which can later be quickly loaded from multiple parallel threads. >>> >>> Currently concurrency causes that 8 -16 threads (on 4-8 logical core systems) try to load same classes at same time. >>> this leads to lots of contention as result. "preloading" some of this classes reduces contention. >>> >>> Looking at the list in the current "hack impl" there are lots of classes that don't need to be there, stuff like subsystem parsers which are only loaded once in any case. >>> >>> Main pressure is on classes from jboss-modules, controller, server & xml parsers modules, all others are not as problematic. >>> This is also reason why lots of contention is happening on JDK classes as well as those are shared between all parts of server code. >>> >> >> Stuart/Tomaz — >> >> Please ignore this for now if your thinking has moved on to other approaches, e.g. better concurrency in classloading. :) >> >> Otherwise, are there any numbers on this last point Tomaz made? >> >> I ask because people are asking for a static list since a dynamic list is of no benefit to cloud use cases. >> >> A static list is painful to administer though, and if not administered well can result in loading unneeded classes and wasting memory. >> >> But, a static list limited to modules that are part of the WildFly Core kernel is not particularly hard to administer. So if we can get the bulk of the gains with the minimum of the pain, we might consider that. >> > > We can also just have a dynamic offline list generation, which is ran as a build task. Yes, that’s my assumption. When I say “static” I mean static on a given installation. If it is limited to the kernel (including relevant JDK bits), then there are no issues with ensuring different feature pack maintainers are doing this, no need to combine lists from different parts of the build, no worries about ensuring only those bits relevant to what the user is actually running are loaded, etc. Those things are the “painful to administer part”. They might very well be worth it but data should demonstrate that.

If it turns out to be worth it, the feature pack generation stuff could be enhanced to generate the list for each config. Perhaps only if doing -Prelease so we don't slow down everybody's builds while developing

...

-- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Tomaž Cerar

4:33 a.m.

On Thu, May 18, 2017 at 10:28 AM, Kabir Khan <kabir.khan(a)jboss.com> wrote:

...

> If it is limited to the kernel (including relevant JDK bits), then there are no issues with ensuring different feature pack maintainers are doing this, no need to combine lists from different parts of the build, no worries about ensuring only those bits relevant to what the user is actually running are loaded, etc. Those things are the “painful to administer part”. They might very well be worth it but data should demonstrate that. If it turns out to be worth it, the feature pack generation stuff could be enhanced to generate the list for each config. Perhaps only if doing -Prelease so we don't slow down everybody's builds while developing

I think that only core stuff should need this, as there is where most of contention for class loading is. As all extension need classes from core it is most contested. So I think if we go with this, jdk + core would be the yield best work / benefit results -- tomaz

Sanne Grinovero

5:19 a.m.

On 18 May 2017 at 10:33, Tomaž Cerar <tomaz.cerar(a)gmail.com> wrote:

...

On Thu, May 18, 2017 at 10:28 AM, Kabir Khan <kabir.khan(a)jboss.com> wrote: > > > If it is limited to the kernel (including relevant JDK bits), then there > > are no issues with ensuring different feature pack maintainers are doing > > this, no need to combine lists from different parts of the build, no worries > > about ensuring only those bits relevant to what the user is actually running > > are loaded, etc. Those things are the “painful to administer part”. They > > might very well be worth it but data should demonstrate that. > If it turns out to be worth it, the feature pack generation stuff could be > enhanced to generate the list for each config. Perhaps only if doing > -Prelease so we don't slow down everybody's builds while developing I think that only core stuff should need this, as there is where most of contention for class loading is. As all extension need classes from core it is most contested. So I think if we go with this, jdk + core would be the yield best work / benefit results

Be it first boot or deployment, I agree that what matters most is the time to have the deployed application running & responding. So it would be nice if such a technique could be automated and made generic, to see if other components can benefit from such an approach at minimal maintenance overhead. Incidentally I'm mostly bothered by Hibernate being slow to boot, but I don't think this particular optimisation would help. Our bootstrap isn't concurrent; there's just a lot to load - sequentially - and possibly scanning a combination of classpaths for discovery of entities & services; hopefully we can improve this by narrowing down the scope to be scanned but that's clearly an orthogonal issue. Thanks, Sanne

Stuart Douglas

Monday, 15 May Mon, 15 May

4:52 p.m.

On Tue, May 16, 2017 at 12:13 AM, Brian Stansberry < brian.stansberry(a)redhat.com> wrote:

...

Definitely worth investigating. I’d like to have a real good understanding of why it has the benefits it has, so we can see if this is the best way to get them or if something else is better.

I am pretty sure it is contention related. I modified my hack to load all classes from the same module at once (so once the first class from a module in that properties file is reached, it loads all others from the same module), and this gave another small but significant speedup (so the total gain is ~2.0-2.1s down from ~2.9s). Looking at the results of monitor profiling in Yourkit it looks like the reason is reduced contention. There is 50% less thread wait time on ModuleLoader$FutureModule, contention on JarFileResourceLoader is no more. I think the reason is that we have a lot of threads active at boot and this results in a lot of contention in module/class loading. Stuart

...

This kicks in just before the ModelController starts and begins parsing the config. The config parsing quickly gets into parallel work; as soon as the extension elements are reached the extension modules are loaded concurrently. Then once the parsing is done each subsystem is installed concurrently, so lots of threads doing concurrent classloading. So why does adding two more make such a big difference? Is it that they gets lots of work done in that time when the regular boot thread is not doing concurrent work, i.e. the parsing and the non-parallel bits of operation execution? Is it that these threads are just chugging along doing classloading efficiently while the parallel threads are running along inefficiently getting scheduled and unscheduled? The latter doesn’t make sense to me as there’s no reason why these threads would be any more efficient than the others. - Brian > On May 14, 2017, at 6:36 PM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote: > > When JIRA was being screwy on Friday I used the time to investigate an idea I have had for a while about improving our boot time performance. According to Yourkit the majority of our time is spent in class loading. It seems very unlikely that we will be able to reduce the number of classes we load on boot (or at the very least it would be a massive amount of work) so I investigated a different approach. > > I modified ModuleClassLoader to spit out the name and module of every class that is loaded at boot time, and stored this in a properties file. I then created a simple Service that starts immediately that uses two threads to eagerly load every class on this list (I used two threads because that seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is probably the best amount, but that assumption would need to be tested on different hardware). > > The idea behind this is that we know the classes will be used at some point, and we generally do not fully utilise all CPU's during boot, so we can use the unused CPU to pre load these classes so they are ready when they are actually required. > > Using this approach I saw the boot time for standalone.xml drop from ~2.9s to ~2.3s on my laptop. The (super hacky) code I used to perform this test is at https://github.com/wildfly/wildfly-core/compare/master... stuartwdouglas:boot-performance-hack > > I think these initial results are encouraging, and it is a big enough gain that I think it is worth investigating further. > > Firstly it would be great if I could get others to try it out and see if they see similar gains to boot time, it may be that the gain is very system dependent. > > Secondly if we do decide to do this there are two approach that we can use that I can see: > > 1) A hard coded list of class names that we generate before a release (basically what the hack already does), this is simplest, but does add a little bit of additional work to the release process (although if it is missed it would be no big deal, as ClassNotFoundException's would be suppressed, and if a few classes are missing the performance impact is negligible as long as the majority of the list is correct). > > 2) Generate the list dynamically on first boot, and store it in the temp directory. This would require the addition of a hook into JBoss Modules to generate the list, but is the approach I would prefer (as first boot is always a bit slower anyway). > > Thoughts? > > Stuart > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev -- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat

Brian Stansberry

5:16 p.m.

...

From the time I did parallel boot I’ve always wondered if the level of concurrency was valid, but I never got around to doing any experimentation.

It’s quite naive; a task per extension module load and then one per subystem. I’ve wanted to look into instead dividing the work into X larger tasks with X derived from the number of cores. But for your fix to be helping things so much it must be loading a lot of these classes during the single-threaded parts of the boot, so I don’t see how my changing it to have fewer tasks would compete with that. It may be beneficial regardless though, e.g. by not spinning up more threads that can be efficiently used.

...

On May 15, 2017, at 4:52 PM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote: On Tue, May 16, 2017 at 12:13 AM, Brian Stansberry <brian.stansberry(a)redhat.com> wrote: Definitely worth investigating. I’d like to have a real good understanding of why it has the benefits it has, so we can see if this is the best way to get them or if something else is better. I am pretty sure it is contention related. I modified my hack to load all classes from the same module at once (so once the first class from a module in that properties file is reached, it loads all others from the same module), and this gave another small but significant speedup (so the total gain is ~2.0-2.1s down from ~2.9s). Looking at the results of monitor profiling in Yourkit it looks like the reason is reduced contention. There is 50% less thread wait time on ModuleLoader$FutureModule, contention on JarFileResourceLoader is no more. I think the reason is that we have a lot of threads active at boot and this results in a lot of contention in module/class loading. Stuart This kicks in just before the ModelController starts and begins parsing the config. The config parsing quickly gets into parallel work; as soon as the extension elements are reached the extension modules are loaded concurrently. Then once the parsing is done each subsystem is installed concurrently, so lots of threads doing concurrent classloading. So why does adding two more make such a big difference? Is it that they gets lots of work done in that time when the regular boot thread is not doing concurrent work, i.e. the parsing and the non-parallel bits of operation execution? Is it that these threads are just chugging along doing classloading efficiently while the parallel threads are running along inefficiently getting scheduled and unscheduled? The latter doesn’t make sense to me as there’s no reason why these threads would be any more efficient than the others. - Brian > On May 14, 2017, at 6:36 PM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote: > > When JIRA was being screwy on Friday I used the time to investigate an idea I have had for a while about improving our boot time performance. According to Yourkit the majority of our time is spent in class loading. It seems very unlikely that we will be able to reduce the number of classes we load on boot (or at the very least it would be a massive amount of work) so I investigated a different approach. > > I modified ModuleClassLoader to spit out the name and module of every class that is loaded at boot time, and stored this in a properties file. I then created a simple Service that starts immediately that uses two threads to eagerly load every class on this list (I used two threads because that seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is probably the best amount, but that assumption would need to be tested on different hardware). > > The idea behind this is that we know the classes will be used at some point, and we generally do not fully utilise all CPU's during boot, so we can use the unused CPU to pre load these classes so they are ready when they are actually required. > > Using this approach I saw the boot time for standalone.xml drop from ~2.9s to ~2.3s on my laptop. The (super hacky) code I used to perform this test is at https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b... > > I think these initial results are encouraging, and it is a big enough gain that I think it is worth investigating further. > > Firstly it would be great if I could get others to try it out and see if they see similar gains to boot time, it may be that the gain is very system dependent. > > Secondly if we do decide to do this there are two approach that we can use that I can see: > > 1) A hard coded list of class names that we generate before a release (basically what the hack already does), this is simplest, but does add a little bit of additional work to the release process (although if it is missed it would be no big deal, as ClassNotFoundException's would be suppressed, and if a few classes are missing the performance impact is negligible as long as the majority of the list is correct). > > 2) Generate the list dynamically on first boot, and store it in the temp directory. This would require the addition of a hook into JBoss Modules to generate the list, but is the approach I would prefer (as first boot is always a bit slower anyway). > > Thoughts? > > Stuart > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev -- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat

-- Brian Stansberry Manager, Senior Principal Software Engineer JBoss by Red Hat

David M. Lloyd

10:34 a.m.

...

-- - DML

Stuart Douglas

5:21 p.m.

On Tue, May 16, 2017 at 1:34 AM, David M. Lloyd <david.lloyd(a)redhat.com> wrote:

...

I set a breakpoint in loadClassLocal to print off the information.

...

Secondly, while debugging a resource iteration performance problem a user was having with a large number of deployments, I discovered that contention for the lock on JarFile and ZipFile was a primary cause. The workaround I employed was to keep a RAM-based List of the files in the JAR, which can be iterated over without touching the lock. When we're preloading classes, we're definitely going to see this same kind of contention come up, because there's only one lock per JarFile instance so you can only ever read one entry at a time, thus preventing any kind of useful concurrency on a per-module basis.

I think this is why I see an even bigger gain when pre-loading classes one module at a time.

...

Exploding the files out of the JarFile could expose this contention and therefore might be useful as a test - but it would also skew the results a little because you have no decompression overhead, and creating the separate file streams hypothetically might be somewhat more (or less) expensive. I joked about resurrecting jzipfile (which I killed off because it was something like 20% slower at decompressing entries than Jar/ZipFile) but it might be worth considering having our own JAR extractor at some point with a view towards concurrency gains. If we go this route, we could go even further and create an optimized module format, which is an idea I think we've looked at a little bit in the past; there are a few avenues of exploration here which could be interesting.

This could be worth investigating. Stuart

...

At some point we also need to see how jaotc might improve things. It probably won't improve class loading time directly, but it might improve the processes by which class loading is done because all the one-off bits would be precompiled. Also it's worth exploring whether the jimage format has contention issues like this. On 05/14/2017 06:36 PM, Stuart Douglas wrote: > When JIRA was being screwy on Friday I used the time to investigate an > idea I have had for a while about improving our boot time performance. > According to Yourkit the majority of our time is spent in class loading. > It seems very unlikely that we will be able to reduce the number of > classes we load on boot (or at the very least it would be a massive > amount of work) so I investigated a different approach. > > I modified ModuleClassLoader to spit out the name and module of every > class that is loaded at boot time, and stored this in a properties file. > I then created a simple Service that starts immediately that uses two > threads to eagerly load every class on this list (I used two threads > because that seemed to work well on my laptop, I think > Runtime.availableProcessors()/4 is probably the best amount, but that > assumption would need to be tested on different hardware). > > The idea behind this is that we know the classes will be used at some > point, and we generally do not fully utilise all CPU's during boot, so > we can use the unused CPU to pre load these classes so they are ready > when they are actually required. > > Using this approach I saw the boot time for standalone.xml drop from > ~2.9s to ~2.3s on my laptop. The (super hacky) code I used to perform > this test is at > https://github.com/wildfly/wildfly-core/compare/master... stuartwdouglas:boot-performance-hack > > I think these initial results are encouraging, and it is a big enough > gain that I think it is worth investigating further. > > Firstly it would be great if I could get others to try it out and see if > they see similar gains to boot time, it may be that the gain is very > system dependent. > > Secondly if we do decide to do this there are two approach that we can > use that I can see: > > 1) A hard coded list of class names that we generate before a release > (basically what the hack already does), this is simplest, but does add a > little bit of additional work to the release process (although if it is > missed it would be no big deal, as ClassNotFoundException's would be > suppressed, and if a few classes are missing the performance impact is > negligible as long as the majority of the list is correct). > > 2) Generate the list dynamically on first boot, and store it in the temp > directory. This would require the addition of a hook into JBoss Modules > to generate the list, but is the approach I would prefer (as first boot > is always a bit slower anyway). > > Thoughts? > > Stuart > > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev > -- - DML _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

David M. Lloyd

Thursday, 18 May Thu, 18 May

8:50 a.m.

On 05/15/2017 05:21 PM, Stuart Douglas wrote:

...

On Tue, May 16, 2017 at 1:34 AM, David M. Lloyd <david.lloyd(a)redhat.com> wrote: > Exploding the files out of the JarFile could expose this contention and > therefore might be useful as a test - but it would also skew the results > a little because you have no decompression overhead, and creating the > separate file streams hypothetically might be somewhat more (or less) > expensive. I joked about resurrecting jzipfile (which I killed off > because it was something like 20% slower at decompressing entries than > Jar/ZipFile) but it might be worth considering having our own JAR > extractor at some point with a view towards concurrency gains. If we go > this route, we could go even further and create an optimized module > format, which is an idea I think we've looked at a little bit in the > past; there are a few avenues of exploration here which could be > interesting. This could be worth investigating.

Tomaž did a prototype of using the JDK JAR filesystem to back the resource loader if it is available; contention did go down but memory footprint went up, and overall the additional indexing and allocation ended up slowing down boot a little, unfortunately (though large numbers of deployments seemed to be faster). Tomaž can elaborate on his findings if he wishes. I had a look in the JAR FS implementation (and its parent class, the ZIP FS implementation, which does most of the hard work), and there are a few things which add overhead and contention that we don't need, like using read/write locks to manage access and modifications (which we don't need) and (synch-based) indexing structures that might be somewhat larger than necessary. They use NIO channels to access the zip data, which is probably OK, but maybe mapped buffers could be better... or worse? They use a synchronized list per JAR file to pool Inflaters; pooling is a hard thing to do right so maybe there isn't really any better option in this case. But in any event, I think a custom extractor still might be a reasonable thing to experiment with. We could resurrect jzipfile or try a different approach (maybe see how well mapped buffers work?). Since we're read-only, any indexes we use can be immutable and thus unsynchronized, and maybe more compact as a result. We can use an unordered hash table because we generally don't care about file order the way that JarFile historically needs to, thus making indexing faster. We could save object allocation overhead by using a specialized object->int hash table that just records offsets into the index for each entry. If we try mapped buffers, we could share one buffer concurrently by using only methods that accept an offset, and track offsets independently. This would let the OS page cache work for us, especially for heavily used JARs. We would be limited to 2GB JAR files, but I don't think that's likely to be a practical problem for us; if it ever is, we can create a specialized alternative implementation for huge JARs. In Java 9, jimages become an option by way of jlink, which will also be worth experimenting with (as soon as we're booting on Java 9). Brainstorm other ideas here! -- - DML

Andrig Miller

9:04 a.m.

On Thu, May 18, 2017 at 7:50 AM, David M. Lloyd <david.lloyd(a)redhat.com> wrote:

...

On 05/15/2017 05:21 PM, Stuart Douglas wrote: > On Tue, May 16, 2017 at 1:34 AM, David M. Lloyd <david.lloyd(a)redhat.com> wrote: >> Exploding the files out of the JarFile could expose this contention and >> therefore might be useful as a test - but it would also skew the results >> a little because you have no decompression overhead, and creating the >> separate file streams hypothetically might be somewhat more (or less) >> expensive. I joked about resurrecting jzipfile (which I killed off >> because it was something like 20% slower at decompressing entries than >> Jar/ZipFile) but it might be worth considering having our own JAR >> extractor at some point with a view towards concurrency gains. If we go >> this route, we could go even further and create an optimized module >> format, which is an idea I think we've looked at a little bit in the >> past; there are a few avenues of exploration here which could be >> interesting. > > This could be worth investigating. Tomaž did a prototype of using the JDK JAR filesystem to back the resource loader if it is available; contention did go down but memory footprint went up, and overall the additional indexing and allocation ended up slowing down boot a little, unfortunately (though large numbers of deployments seemed to be faster). Tomaž can elaborate on his findings if he wishes. I had a look in the JAR FS implementation (and its parent class, the ZIP FS implementation, which does most of the hard work), and there are a few things which add overhead and contention that we don't need, like using read/write locks to manage access and modifications (which we don't need) and (synch-based) indexing structures that might be somewhat larger than necessary. They use NIO channels to access the zip data, which is probably OK, but maybe mapped buffers could be better... or worse? They use a synchronized list per JAR file to pool Inflaters; pooling is a hard thing to do right so maybe there isn't really any better option in this case. But in any event, I think a custom extractor still might be a reasonable thing to experiment with. We could resurrect jzipfile or try a different approach (maybe see how well mapped buffers work?). Since we're read-only, any indexes we use can be immutable and thus unsynchronized, and maybe more compact as a result. We can use an unordered hash table because we generally don't care about file order the way that JarFile historically needs to, thus making indexing faster. We could save object allocation overhead by using a specialized object->int hash table that just records offsets into the index for each entry. If we try mapped buffers, we could share one buffer concurrently by using only methods that accept an offset, and track offsets independently. This would let the OS page cache work for us, especially for heavily used JARs. We would be limited to 2GB JAR files, but I don't think that's likely to be a practical problem for us; if it ever is, we can create a specialized alternative implementation for huge JARs.

I'm not so sure that the OS page cache will do anything here. I actually think it would be better if we could open the JAR files using direct I/O, but of course Java doesn't support that, and that would require native code, so not the greatest option. Andy

...

In Java 9, jimages become an option by way of jlink, which will also be worth experimenting with (as soon as we're booting on Java 9). Brainstorm other ideas here! -- - DML _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

-- Andrig (Andy) T. Miller Global Platform Director, Middleware Red Hat, Inc.

David M. Lloyd

2:18 p.m.

On 05/18/2017 09:04 AM, Andrig Miller wrote:

...

On Thu, May 18, 2017 at 7:50 AM, David M. Lloyd <david.lloyd(a)redhat.com> wrote: > On 05/15/2017 05:21 PM, Stuart Douglas wrote: >> On Tue, May 16, 2017 at 1:34 AM, David M. Lloyd <david.lloyd(a)redhat.com> > wrote: >>> Exploding the files out of the JarFile could expose this contention and >>> therefore might be useful as a test - but it would also skew the results >>> a little because you have no decompression overhead, and creating the >>> separate file streams hypothetically might be somewhat more (or less) >>> expensive. I joked about resurrecting jzipfile (which I killed off >>> because it was something like 20% slower at decompressing entries than >>> Jar/ZipFile) but it might be worth considering having our own JAR >>> extractor at some point with a view towards concurrency gains. If we go >>> this route, we could go even further and create an optimized module >>> format, which is an idea I think we've looked at a little bit in the >>> past; there are a few avenues of exploration here which could be >>> interesting. >> >> This could be worth investigating. > > Tomaž did a prototype of using the JDK JAR filesystem to back the > resource loader if it is available; contention did go down but memory > footprint went up, and overall the additional indexing and allocation > ended up slowing down boot a little, unfortunately (though large numbers > of deployments seemed to be faster). Tomaž can elaborate on his > findings if he wishes. > > I had a look in the JAR FS implementation (and its parent class, the ZIP > FS implementation, which does most of the hard work), and there are a > few things which add overhead and contention that we don't need, like > using read/write locks to manage access and modifications (which we > don't need) and (synch-based) indexing structures that might be somewhat > larger than necessary. They use NIO channels to access the zip data, > which is probably OK, but maybe mapped buffers could be better... or > worse? They use a synchronized list per JAR file to pool Inflaters; > pooling is a hard thing to do right so maybe there isn't really any > better option in this case. > > But in any event, I think a custom extractor still might be a reasonable > thing to experiment with. We could resurrect jzipfile or try a > different approach (maybe see how well mapped buffers work?). Since > we're read-only, any indexes we use can be immutable and thus > unsynchronized, and maybe more compact as a result. We can use an > unordered hash table because we generally don't care about file order > the way that JarFile historically needs to, thus making indexing faster. > We could save object allocation overhead by using a specialized > object->int hash table that just records offsets into the index for each > entry. > > If we try mapped buffers, we could share one buffer concurrently by > using only methods that accept an offset, and track offsets > independently. This would let the OS page cache work for us, especially > for heavily used JARs. We would be limited to 2GB JAR files, but I > don't think that's likely to be a practical problem for us; if it ever > is, we can create a specialized alternative implementation for huge JARs. > I'm not so sure that the OS page cache will do anything here. I actually think it would be better if we could open the JAR files using direct I/O, but of course Java doesn't support that, and that would require native code, so not the greatest option.

What the page cache would theoretically do for us is keep "hot" areas (i.e. the index) of commonly-used JAR files in RAM, while letting "cold" JARs be paged out, without consuming Java heap or committed memory (thus avoiding GC), while allowing total random access, without any special buffer management. Because we are only reading and not writing, direct I/O won't likely help: either way you block to read from disk, but with memory mapping, you can reread an area many times and the OS will keep it handy for you. On Linux, the page cache works very similarly whether you're mapping in a file or allocating memory from the OS: recently-used pages stay in physical RAM, and old pages get flushed to disk (BUT only if they're dirty) and dropped from physical RAM. So it's effectively similar to allocating several hundred MB, copying all the JAR contents into that memory, and then referencing that, except that in this case you'd have to ensure that there is enough RAM+swap to accommodate it; behaviorally the primary difference is that the mmaped file is "paged out" by default and loaded on demand, whereas the eager allocated memory is "paged in" by default as you populate it and the pages have to age out. Since we are generally not reading entire JAR files though, the lazy behavior should theoretically be a bit better for us. On the other hand, this is a far worse option for 32-bit platforms for the same reason that it's useful on 64-bit: address space. If we map in all the JARs that *we* ship, that could be as much as 25% or more of the available address space gone instantly. So if we did explore this route, we'd need it to be switchable, with sensible defaults based on the available logical address size (and, as I said before, size of the target object). The primary resource cost here (other than address space) is page table entries. We'd be talking about probably hundreds of thousands, once every module has been referenced, on a CPU with 4k pages. In terms of RAM, that's not too much; each one is only a few bytes plus (I believe) a few more bytes for bookkeeping in the kernel, and the kernel is pretty damned good at managing them at this point. But it's not nothing. Of course all this is just educated (?) speculation unless we test & measure it. I suspect that in the end, it'll be subtle tradeoffs, just like everything else ends up being.

...

Andy > > In Java 9, jimages become an option by way of jlink, which will also be > worth experimenting with (as soon as we're booting on Java 9). > > Brainstorm other ideas here! > -- > - DML > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev >

-- - DML

David M. Lloyd

Tuesday, 16 May Tue, 16 May

9:52 a.m.

Off list discussion: the NIO.2 JAR file provider appears to have substantially better performance (and no central locking). This JIRA [1] covers using this by default within JBoss Modules. https://issues.jboss.org/browse/MODULES-285 On 05/15/2017 10:34 AM, David M. Lloyd wrote:

...

I have a few thoughts that might be of interest. Firstly, I'd be interested to see when you are logging the class name being loaded. If you are logging it in loadClass, you might not be seeing the actual correct load order because that method is ultimately recursive. To get an accurate picture of what order that classes are actually defined - and thus what order you can load them in order to prevent contention on per-class locks within the CL - you should log immediately _after_ defineClass completes for each class. Secondly, while debugging a resource iteration performance problem a user was having with a large number of deployments, I discovered that contention for the lock on JarFile and ZipFile was a primary cause. The workaround I employed was to keep a RAM-based List of the files in the JAR, which can be iterated over without touching the lock. When we're preloading classes, we're definitely going to see this same kind of contention come up, because there's only one lock per JarFile instance so you can only ever read one entry at a time, thus preventing any kind of useful concurrency on a per-module basis. Exploding the files out of the JarFile could expose this contention and therefore might be useful as a test - but it would also skew the results a little because you have no decompression overhead, and creating the separate file streams hypothetically might be somewhat more (or less) expensive. I joked about resurrecting jzipfile (which I killed off because it was something like 20% slower at decompressing entries than Jar/ZipFile) but it might be worth considering having our own JAR extractor at some point with a view towards concurrency gains. If we go this route, we could go even further and create an optimized module format, which is an idea I think we've looked at a little bit in the past; there are a few avenues of exploration here which could be interesting. At some point we also need to see how jaotc might improve things. It probably won't improve class loading time directly, but it might improve the processes by which class loading is done because all the one-off bits would be precompiled. Also it's worth exploring whether the jimage format has contention issues like this. On 05/14/2017 06:36 PM, Stuart Douglas wrote: > When JIRA was being screwy on Friday I used the time to investigate an > idea I have had for a while about improving our boot time performance. > According to Yourkit the majority of our time is spent in class loading. > It seems very unlikely that we will be able to reduce the number of > classes we load on boot (or at the very least it would be a massive > amount of work) so I investigated a different approach. > > I modified ModuleClassLoader to spit out the name and module of every > class that is loaded at boot time, and stored this in a properties file. > I then created a simple Service that starts immediately that uses two > threads to eagerly load every class on this list (I used two threads > because that seemed to work well on my laptop, I think > Runtime.availableProcessors()/4 is probably the best amount, but that > assumption would need to be tested on different hardware). > > The idea behind this is that we know the classes will be used at some > point, and we generally do not fully utilise all CPU's during boot, so > we can use the unused CPU to pre load these classes so they are ready > when they are actually required. > > Using this approach I saw the boot time for standalone.xml drop from > ~2.9s to ~2.3s on my laptop. The (super hacky) code I used to perform > this test is at > https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b... > > I think these initial results are encouraging, and it is a big enough > gain that I think it is worth investigating further. > > Firstly it would be great if I could get others to try it out and see if > they see similar gains to boot time, it may be that the gain is very > system dependent. > > Secondly if we do decide to do this there are two approach that we can > use that I can see: > > 1) A hard coded list of class names that we generate before a release > (basically what the hack already does), this is simplest, but does add a > little bit of additional work to the release process (although if it is > missed it would be no big deal, as ClassNotFoundException's would be > suppressed, and if a few classes are missing the performance impact is > negligible as long as the majority of the list is correct). > > 2) Generate the list dynamically on first boot, and store it in the temp > directory. This would require the addition of a hook into JBoss Modules > to generate the list, but is the approach I would prefer (as first boot > is always a bit slower anyway). > > Thoughts? > > Stuart > > > _______________________________________________ > wildfly-dev mailing list > wildfly-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/wildfly-dev >

-- - DML

J Pai

6:32 a.m.

Not to undermine these efforts (in fact, this thread has actually brought up a couple of really interesting details), but one of the things I have always seen when we spent time trying to add relatively complex ways to squeeze some milli seconds out of the boot time, is that for the end users, most of the times it really didn’t matter in a noticeable way.I am not talking about the major improvements we have made from AS5/AS6 to the WildFly boot times today. What I have experienced is that for end users, they are mostly interested in seeing their (usually large) deployments show noticeable improvements in deployment time, not necessarily from a cold boot of the server, but when the server is already up and they either want to deploy something new or re-deploy their application. All in all, as a developer, I will be curiously following how these experiments go, but as an end user, I am not sure this will show up as something noticeable. Of course, the place where this would probably make a difference (even from an end user perspective) is something like maybe WildFly Swarm, but then again I haven’t been following that project to understand if these efforts will directly end up somehow in WildFly Swarm. -Jaikiran On 15-May-2017, at 5:06 AM, Stuart Douglas <stuart.w.douglas(a)gmail.com> wrote: When JIRA was being screwy on Friday I used the time to investigate an idea I have had for a while about improving our boot time performance. According to Yourkit the majority of our time is spent in class loading. It seems very unlikely that we will be able to reduce the number of classes we load on boot (or at the very least it would be a massive amount of work) so I investigated a different approach. I modified ModuleClassLoader to spit out the name and module of every class that is loaded at boot time, and stored this in a properties file. I then created a simple Service that starts immediately that uses two threads to eagerly load every class on this list (I used two threads because that seemed to work well on my laptop, I think Runtime.availableProcessors()/4 is probably the best amount, but that assumption would need to be tested on different hardware). The idea behind this is that we know the classes will be used at some point, and we generally do not fully utilise all CPU's during boot, so we can use the unused CPU to pre load these classes so they are ready when they are actually required. Using this approach I saw the boot time for standalone.xml drop from ~2.9s to ~2.3s on my laptop. The (super hacky) code I used to perform this test is at https://github.com/wildfly/wildfly-core/compare/master...stuartwdouglas:b... I think these initial results are encouraging, and it is a big enough gain that I think it is worth investigating further. Firstly it would be great if I could get others to try it out and see if they see similar gains to boot time, it may be that the gain is very system dependent. Secondly if we do decide to do this there are two approach that we can use that I can see: 1) A hard coded list of class names that we generate before a release (basically what the hack already does), this is simplest, but does add a little bit of additional work to the release process (although if it is missed it would be no big deal, as ClassNotFoundException's would be suppressed, and if a few classes are missing the performance impact is negligible as long as the majority of the list is correct). 2) Generate the list dynamically on first boot, and store it in the temp directory. This would require the addition of a hook into JBoss Modules to generate the list, but is the approach I would prefer (as first boot is always a bit slower anyway). Thoughts? Stuart _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

Heiko Braun

7:41 a.m.

...

On 16. May 2017, at 13:32, J Pai <jai.forums2013(a)gmail.com> wrote: What I have experienced is that for end users, they are mostly interested in seeing their (usually large) deployments show noticeable improvements in deployment time, not necessarily from a cold boot of the server, but when the server is already up and they either want to deploy something new or re-deploy their application.

+1 the deployments increase the time until “ready to perform work”. This is the point we should use as a reference. Anything before (i.e. blank WF without deployments) is just marketing IMO. Heiko

Jason Greene

8:47 a.m.

...

On May 16, 2017, at 7:41 AM, Heiko Braun <hbraun(a)redhat.com> wrote: > On 16. May 2017, at 13:32, J Pai <jai.forums2013(a)gmail.com <mailto:jai.forums2013@gmail.com>> wrote: > > What I have experienced is that for end users, they are mostly interested in seeing their (usually large) deployments show noticeable improvements in deployment time, not necessarily from a cold boot of the server, but when the server is already up and they either want to deploy something new or re-deploy their application. +1 the deployments increase the time until “ready to perform work”. This is the point we should use as a reference. Anything before (i.e. blank WF without deployments) is just marketing IMO.

I agree that deployment time is important, but I just want to point out that not all usages of WildFly involve deployments. Examples include proxy servers, static content servers, message brokers, javascript code, transaction managers, and service based applications. -- Jason T. Greene WildFly Lead / JBoss EAP Platform Architect JBoss, a division of Red Hat

Andrig Miller

10:01 a.m.

One thing I would like to mention is that with our OpenShift first strategy, anything we do should also take into account memory footprint changes. We are still doing analysis on the memory footprint of EAP, but will have something to publish fairly soon. One thing we should avoid here is approaches that allocate memory that won't go away when the boot process is done. Andy On Tue, May 16, 2017 at 7:47 AM, Jason Greene <jason.greene(a)redhat.com> wrote:

...

On May 16, 2017, at 7:41 AM, Heiko Braun <hbraun(a)redhat.com> wrote: On 16. May 2017, at 13:32, J Pai <jai.forums2013(a)gmail.com> wrote: What I have experienced is that for end users, they are mostly interested in seeing their (usually large) deployments show noticeable improvements in deployment time, not necessarily from a cold boot of the server, but when the server is already up and they either want to deploy something new or re-deploy their application. +1 the deployments increase the time until “ready to perform work”. This is the point we should use as a reference. Anything before (i.e. blank WF without deployments) is just marketing IMO. I agree that deployment time is important, but I just want to point out that not all usages of WildFly involve deployments. Examples include proxy servers, static content servers, message brokers, javascript code, transaction managers, and service based applications. -- Jason T. Greene WildFly Lead / JBoss EAP Platform Architect JBoss, a division of Red Hat _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

-- Andrig (Andy) T. Miller Global Platform Director, Middleware Red Hat, Inc.

David M. Lloyd

9 a.m.

On 05/16/2017 07:41 AM, Heiko Braun wrote:

...

> On 16. May 2017, at 13:32, J Pai <jai.forums2013(a)gmail.com > <mailto:jai.forums2013@gmail.com>> wrote: > > What I have experienced is that for end users, they are mostly > interested in seeing their (usually large) deployments show noticeable > improvements in deployment time, not necessarily from a cold boot of > the server, but when the server is already up and they either want to > deploy something new or re-deploy their application. +1 the deployments increase the time until “ready to perform work”. This is the point we should use as a reference. Anything before (i.e. blank WF without deployments) is just marketing IMO.

Startup time to "ready" does include the server init; so such an effort isn't a total waste of time in this case. But I agree with your main point. But I think if we can squeeze a bit more speed out of initialization, there's no harm in trying for it. The performance data that comes from this analysis has already been used to target areas that will improve performance for every part of startup, including deployment (maybe substantially). -- - DML

Heiko Braun

10:02 a.m.

...

On 16. May 2017, at 16:00, David M. Lloyd <david.lloyd(a)redhat.com> wrote: Startup time to "ready" does include the server init; so such an effort isn't a total waste of time in this case. But I agree with your main point. But I think if we can squeeze a bit more speed out of initialization, there's no harm in trying for it. The performance data that comes from this analysis has already been used to target areas that will improve performance for every part of startup, including deployment (maybe substantially).

I was exaggerating when I used the term “marketing”. You are right and Jason has some valid points too. Heiko

Tomaž Cerar

8:14 a.m.

Hey Jaikiran! On Tue, May 16, 2017 at 1:32 PM, J Pai <jai.forums2013(a)gmail.com> wrote:

...

What I have experienced is that for end users, they are mostly interested in seeing their (usually large) deployments show noticeable improvements in deployment time, not necessarily from a cold boot of the server, but when the server is already up and they either want to deploy something new or re-deploy their application.

We all agree on this and we are looking into speeding up user deployments as well. One of bottlenecks with deployments is how we read deployment contents as it has showed that java.util.jar.JarFile that we are using to load resources doesn't really scale well in concurrent environments and is causing lots of slowdown. We are now looking what could we do to mitigate this by different approaches, but as none of them are in fully workable state I wouldn't go into details yet. In short, speeding up user deployments is on our radar. - tomaz

Scott Marlow

8:59 a.m.

Excellent idea!

...

1) A hard coded list of class names that we generate before a release (basically what the hack already does), this is simplest, but does add a little bit of additional work to the release process (although if it is missed it would be no big deal, as ClassNotFoundException's would be suppressed, and if a few classes are missing the performance impact is negligible as long as the majority of the list is correct).

Could the list of class names be read from the server configuration? I assume that would likely defeat the purpose of pre-loading these classes, as we would to then wait until the server configuration is read. Perhaps an alternative could be allowing a system property setting to override the class list. I am thinking that users might want some influence over which classes are pre-loaded so they can prune the list and also add to it.

...

2) Generate the list dynamically on first boot, and store it in the temp directory. This would require the addition of a hook into JBoss Modules to generate the list, but is the approach I would prefer (as first boot is always a bit slower anyway).

I like this best but also wonder how users would deal with updating the list, if they know it should contain a different set of class names. Perhaps they could know to delete the list from the temp directory, at the right time (e.g. after stopping the app server but before starting the app server again). If we do add a system property for allowing the user to specify the list of classes (or perhaps name of file that contains the list), IMO, I think that system property should override the list that we generate in the temp directory.

...

Thoughts? Stuart _______________________________________________ wildfly-dev mailing list wildfly-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/wildfly-dev

3350

days inactive

3354

days old

wildfly-dev@lists.jboss.org

Manage subscription

33 comments

14 participants

tags (0)

participants (14)

Andrig Miller
Bob McWhirter
Brian Stansberry
David M. Lloyd
Heiko Braun
Heiko W.Rupp
J Pai
Jason Greene
Kabir Khan
Rostislav Svoboda
Sanne Grinovero
Scott Marlow
Stuart Douglas
Tomaž Cerar

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Speeding up WildFly boot time