Thanks Dan,
that solved the main issue, I no longer have OOMs on the core module.
I'll merge your PR as soon as I completed the full build.
Interesting idea to disable TieredCompilation, I'll try that on other
projects too.
If someone is up for some additional love as follow ups:
- raising the heap from 1G to ~1300M does give it quite some more
breathing space, I believe it should still work on a 2GB testing
machine.
- I still see quite some MBeans in the JConsole at the end of the
build, something is leaking these and they do keep references to
CacheManagers.
- still seeing an unreasonable amount of threads as well, varying
from ~200 to ~2000. Possibly related to the previous point?
Cheers,
Sanne
On 19 February 2018 at 11:57, Dan Berindei <dan.berindei(a)gmail.com> wrote:
Ok, so the biggest problem is that TestNG keeps test instances around
until
the end of the test suite, and many of our tests are quite heavyweight
because they keep references to caches/managers even after they finish. I've
opened a PR to set those fields to null, fix some smaller leaks, and use
-XX:+UseG1GC -XX:-TieredCompilation, and I'm getting ~ 11 mins on my laptop.
https://github.com/infinispan/infinispan/pull/5768
It's still a lot, especially knowing that not long ago it would take half of
that, but making it shorter would probably involve looking deeper into the
(many) tests that we've added in the last year or so.
Cheers
Dan
On Fri, Feb 16, 2018 at 8:05 AM, Dan Berindei <dan.berindei(a)gmail.com>
wrote:
>
> Yeah, I got a much slower run with the default collector (parallel):
>
> [INFO] Total time: 17:45 min
> GC Time: 2m 43s
> Compile time: 18m 20s
>
> I'm not sure if it's really the GC affecting the compile time or there's
> another factor hiding there. But I did get a heap dump and I'm analyzing it
> now.
>
> Cheers
> Dan
>
>
> On Thu, Feb 15, 2018 at 1:59 PM, Dan Berindei <dan.berindei(a)gmail.com>
> wrote:
>>
>> Hmmm, I didn't notice that I was running with -XX:+UseG1GC, so perhaps
>> our test suite is a pathological case for the default collector?
>>
>> [INFO] Total time: 12:45 min
>> GC Time: 52.593s
>> Class Loader Time: 1m 26.007s
>> Compile Time: 10m 10.216s
>>
>> I'll try without -XX:+UseG1GC later.
>>
>> Cheers
>> Dan
>>
>>
>> On Thu, Feb 15, 2018 at 1:39 PM, Dan Berindei <dan.berindei(a)gmail.com>
>> wrote:
>>>
>>> And here I was thinking that by adding -XX:+HeapDumpOnOutOfMemoryError
>>> anyone would be able to look into OOMEs and I wouldn't have to reproduce
the
>>> failures myself :)
>>>
>>> Dan
>>>
>>>
>>> On Thu, Feb 15, 2018 at 1:32 PM, William Burns <mudokonman(a)gmail.com>
>>> wrote:
>>>>
>>>> So I must admit I had noticed a while back that I was having some
>>>> issues with running the core test suite. Unfortunately at the time CI
and
>>>> everyone else seemed to not have any issues. I just ignored it because
at
>>>> the time I didn't need to run core tests. But now that Sanne pointed
this
>>>> out, by increasing the heap variable in the pom.xml, I was for the first
>>>> time able to run the test suite completely. It would normally hang for
an
>>>> extremely long time near the 9k-10K test completed point and never
finish
>>>> for me (at least I didn't wait long enough).
>>>>
>>>> So it definitely seems there is something leaking in the test suite
>>>> causing the GC to use a ton of CPU time.
>>>>
>>>> - Will
>>>>
>>>> On Thu, Feb 15, 2018 at 5:40 AM Sanne Grinovero
<sanne(a)infinispan.org>
>>>> wrote:
>>>>>
>>>>> Thanks Dan.
>>>>>
>>>>> Do you happen to have observed the memory trend during a build?
>>>>>
>>>>> After a couple more attempts it passed the build once, so that shows
>>>>> it's possible to pass.. but even though it's a small sample
so far
>>>>> that's 1 pass vs 3 OOMs on my machine.
>>>>>
>>>>> Even the one time it successfully completed the tests I see it
wasted
>>>>> ~80% of total build time doing GC runs.. it was likely very close to
>>>>> fall over, and definitely not an efficient setting for regular
builds.
>>>>> Observing trends on my machine I'd guess a reasonable value to
be
>>>>> around 5GB to keep builds fast, or a minimum of 1.3 GB to be able to
>>>>> complete successfully without often failing.
>>>>>
>>>>> The memory issues are worse towards the end of the testsuite, and
>>>>> steadily growing.
>>>>>
>>>>> I won't be able to investigate further as I need to urgently work
on
>>>>> modules, but I noticed there are quite some MBeans according to
>>>>> JConsole. I guess it would be good to check if we're not leaking
the
>>>>> MBean registration, and therefore leaking (stopped?) CacheManagers
>>>>> from there?
>>>>>
>>>>> Even near the beginning of the tests, when forcing a full GC I see
>>>>> about 400MB being "not free". That's quite a lot for
some simple
>>>>> tests, no?
>>>>>
>>>>> Thanks,
>>>>> Sanne
>>>>>
>>>>>
>>>>> On 15 February 2018 at 06:51, Dan Berindei
<dan.berindei(a)gmail.com>
>>>>> wrote:
>>>>> > forkJvmArgs used to be "-Xmx2G" before ISPN-8478. I
reduced the heap
>>>>> > to 1G
>>>>> > because we were trying to run the build on agent VMs with only
4GB
>>>>> > of RAM,
>>>>> > and the 2GB heap was making the build run out of native memory.
>>>>> >
>>>>> > I've yet to see an OOME in the core tests, locally or in CI.
But I
>>>>> > also
>>>>> > included -XX:+HeapDumpOnOutOfMemoryError in forkJvmArgs, so
assuming
>>>>> > there's
>>>>> > a new leak it should be easy to track down in the heap dump.
>>>>> >
>>>>> > Cheers
>>>>> > Dan
>>>>> >
>>>>> >
>>>>> > On Wed, Feb 14, 2018 at 11:46 PM, Sanne Grinovero
>>>>> > <sanne(a)infinispan.org>
>>>>> > wrote:
>>>>> >>
>>>>> >> Hey all,
>>>>> >>
>>>>> >> I'm having OOMs running the tests of infinispan-core.
>>>>> >>
>>>>> >> Initially I thought it was related to limits and security as
that's
>>>>> >> the usual suspect, but no it's really just not enough
memory :)
>>>>> >>
>>>>> >> Found that the root pom.xml sets a <forkJvmArgs>
property to Xmx1G
>>>>> >> for
>>>>> >> surefire; I've been observing the growth of heap usage
in JConsole
>>>>> >> and
>>>>> >> it's clearly not enough.
>>>>> >>
>>>>> >> What surprises me is that - as an occasional tester - I
shouldn't
>>>>> >> be
>>>>> >> the one to notice such a new requirement first. A leak which
only
>>>>> >> manifests in certain conditions?
>>>>> >>
>>>>> >> What do others observe?
>>>>> >>
>>>>> >> FWIW, I'm running it with 8G heap now and it's
working much better;
>>>>> >> still a couple of failures but at least they're not OOM
related.
>>>>> >>
>>>>> >> Thanks,
>>>>> >> Sanne
>>>>> >> _______________________________________________
>>>>> >> infinispan-dev mailing list
>>>>> >> infinispan-dev(a)lists.jboss.org
>>>>> >>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > infinispan-dev mailing list
>>>>> > infinispan-dev(a)lists.jboss.org
>>>>> >
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev(a)lists.jboss.org
>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev(a)lists.jboss.org
>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>
>
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev