Ok, so the biggest problem is that TestNG keeps test instances around until the end of the test suite, and many of our tests are quite heavyweight because they keep references to caches/managers even after they finish. I've opened a PR to set those fields to null, fix some smaller leaks, and use -XX:+UseG1GC -XX:-TieredCompilation, and I'm getting ~ 11 mins on my laptop.

https://github.com/infinispan/infinispan/pull/5768

It's still a lot, especially knowing that not long ago it would take half of that, but making it shorter would probably involve looking deeper into the (many) tests that we've added in the last year or so.

Cheers
Dan


On Fri, Feb 16, 2018 at 8:05 AM, Dan Berindei <dan.berindei@gmail.com> wrote:
Yeah, I got a much slower run with the default collector (parallel):

[INFO] Total time: 17:45 min
GC Time: 2m 43s
Compile time: 18m 20s

I'm not sure if it's really the GC affecting the compile time or there's another factor hiding there. But I did get a heap dump and I'm analyzing it now.

Cheers
Dan


On Thu, Feb 15, 2018 at 1:59 PM, Dan Berindei <dan.berindei@gmail.com> wrote:
Hmmm, I didn't notice that I was running with -XX:+UseG1GC, so perhaps our test suite is a pathological case for the default collector?

[INFO] Total time: 12:45 min
GC Time: 52.593s
Class Loader Time: 1m 26.007s
Compile Time: 10m 10.216s

I'll try without -XX:+UseG1GC later.

Cheers
Dan


On Thu, Feb 15, 2018 at 1:39 PM, Dan Berindei <dan.berindei@gmail.com> wrote:
And here I was thinking that by adding -XX:+HeapDumpOnOutOfMemoryError anyone would be able to look into OOMEs and I wouldn't have to reproduce the failures myself :)

Dan


On Thu, Feb 15, 2018 at 1:32 PM, William Burns <mudokonman@gmail.com> wrote:
So I must admit I had noticed a while back that I was having some issues with running the core test suite. Unfortunately at the time CI and everyone else seemed to not have any issues. I just ignored it because at the time I didn't need to run core tests. But now that Sanne pointed this out, by increasing the heap variable in the pom.xml, I was for the first time able to run the test suite completely. It would normally hang for an extremely long time near the 9k-10K test completed point and never finish for me (at least I didn't wait long enough).

So it definitely seems there is something leaking in the test suite causing the GC to use a ton of CPU time.

 - Will

On Thu, Feb 15, 2018 at 5:40 AM Sanne Grinovero <sanne@infinispan.org> wrote:
Thanks Dan.

Do you happen to have observed the memory trend during a build?

After a couple more attempts it passed the build once, so that shows
it's possible to pass.. but even though it's a small sample so far
that's 1 pass vs 3 OOMs on my machine.

Even the one time it successfully completed the tests I see it wasted
~80% of total build time doing GC runs.. it was likely very close to
fall over, and definitely not an efficient setting for regular builds.
Observing trends on my machine I'd guess a reasonable value to be
around 5GB to keep builds fast, or a minimum of 1.3 GB to be able to
complete successfully without often failing.

The memory issues are worse towards the end of the testsuite, and
steadily growing.

I won't be able to investigate further as I need to urgently work on
modules, but I noticed there are quite some MBeans according to
JConsole. I guess it would be good to check if we're not leaking the
MBean registration, and therefore leaking (stopped?) CacheManagers
from there?

Even near the beginning of the tests, when forcing a full GC I see
about 400MB being "not free". That's quite a lot for some simple
tests, no?

Thanks,
Sanne


On 15 February 2018 at 06:51, Dan Berindei <dan.berindei@gmail.com> wrote:
> forkJvmArgs used to be "-Xmx2G" before ISPN-8478. I reduced the heap to 1G
> because we were trying to run the build on agent VMs with only 4GB of RAM,
> and the 2GB heap was making the build run out of native memory.
>
> I've yet to see an OOME in the core tests, locally or in CI. But I also
> included -XX:+HeapDumpOnOutOfMemoryError in forkJvmArgs, so assuming there's
> a new leak it should be easy to track down in the heap dump.
>
> Cheers
> Dan
>
>
> On Wed, Feb 14, 2018 at 11:46 PM, Sanne Grinovero <sanne@infinispan.org>
> wrote:
>>
>> Hey all,
>>
>> I'm having OOMs running the tests of infinispan-core.
>>
>> Initially I thought it was related to limits and security as that's
>> the usual suspect, but no it's really just not enough memory :)
>>
>> Found that the root pom.xml sets a <forkJvmArgs> property to Xmx1G for
>> surefire; I've been observing the growth of heap usage in JConsole and
>> it's clearly not enough.
>>
>> What surprises me is that - as an occasional tester - I shouldn't be
>> the one to notice such a new requirement first. A leak which only
>> manifests in certain conditions?
>>
>> What do others observe?
>>
>> FWIW, I'm running it with 8G heap now and it's working much better;
>> still a couple of failures but at least they're not OOM related.
>>
>> Thanks,
>> Sanne
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev