<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Hi,</p>
<p>There's on going discussions and more data coming from Heiko, but
since not everyone might be online, I'll fill in here as well..<br>
</p>
<p>Testing with the Hawkular-Metrics (not services), I can't repeat
these. I started by testing just the compression library and after
compressing some 9 billion datapoints, I was still running
constant ~10MB heap usage.</p>
<p>So I started HWKMETRICS in Wildfly 10.0.0.Final and ran with our
load-test, gatling and clients=20, metrics=1000 and loops enough
to keep it running forever. The instance was modified to run with
20 minutes interval of compression.</p>
<p>Monitoring with jconsole, I can't see any difference in memory
usage from first 5 minutes to the last 5 minutes (running for two
hours now). It's running between 147MB of memory and 400MB.</p>
<p>Requires more investigation then..<br>
</p>
<p> - Micke<br>
</p>
<br>
<div class="moz-cite-prefix">On 10/11/2016 09:42 AM, Heiko W.Rupp
wrote:<br>
</div>
<blockquote
cite="mid:A76E22FB-A275-4AD9-A1EF-5C3900BE0DE1@redhat.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<div style="font-family:sans-serif">
<div style="white-space:normal">
<p dir="auto">Hey,</p>
<p dir="auto">tl;dr: we need to investigate heap usage -
especially in the case of compression<br>
kicking in - it looks like there could be a memory leak.
Compression timing seems<br>
mostly ok.</p>
<p dir="auto">I originally wanted to see how more feeds
influence the metrics compression timing.<br>
So I started the server with -Xmx512m as I did in all the
weeks before and pointed a<br>
few feeds at the server - to see it crash with OOME shortly
after the compression started.</p>
<p dir="auto">Now I restarted the server with -Xmx1024m and
also -XX:MaxMetaspaceSize=512m (up from 256m before) and am
running the server now with 1 feed for a day.</p>
<p dir="auto">To be continued below ...</p>
<p dir="auto">( Time is in GMT, which is 2h off of my local
time)<br>
hawkular_1 | 15:00:44,764 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 15:00:45,452 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 687 ms<br>
hawkular_1 | 17:00:44,757 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 17:00:46,796 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 2039 ms<br>
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 19:00:47,293 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 2541 ms<br>
hawkular_1 | 21:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 21:00:46,267 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 1517 ms<br>
hawkular_1 | 23:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 23:00:45,472 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 722 ms<br>
hawkular_1 | 01:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 01:00:46,241 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 1492 ms<br>
hawkular_1 | 03:00:44,747 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 03:00:45,780 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 1033 ms<br>
hawkular_1 | 05:00:44,746 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 05:00:45,781 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 1034 ms<br>
hawkular_1 | 07:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 07:00:46,386 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 1636 ms<br>
hawkular_1 | 09:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 09:00:45,682 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 934 ms<br>
hawkular_1 | 11:00:44,750 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 11:00:46,339 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 1589 ms<br>
hawkular_1 | 13:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 13:00:45,880 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 1132 ms</p>
<p dir="auto">Looking at the memory usage I see the memory
usage there is often a peak in heap usage around and after
the compression - looking at the accumulated GC-time also
shows heavy GC activity. I guess the peak in heap usage (and
thus committed heap) comes from promoting objects from young
generation into old during the compression and then later
after compression is over, they are garbage collected, so
that heap usage goes down and the system is able to free
some memory.</p>
<p dir="auto">Starting at around 4:10pm (2pm in the above
output) I am running with 10 extra feeds (hawkfly on an
external box)</p>
<p dir="auto">While it looks like there is a slowly growing
non-heap, it was growing a lot when the 10 extra feeds
connected.<br>
Also it looks like it is growing a bit more each time
compression kicks in ("non-heap")</p>
<p dir="auto">The first compression run with 11 feeds did not
take too long, the next was</p>
<p dir="auto">hawkular_1 | 17:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 17:00:50,277 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 5528 ms</p>
<p dir="auto">Which used a lot more memory than before.
Non-heap was able to reclaim some memory though.</p>
<p dir="auto">hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 19:00:50,093 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 5341 ms<br>
hawkular_1 | 21:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 21:00:49,465 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 4714 ms<br>
hawkular_1 | 23:00:44,753 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 23:00:48,925 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 4171 ms<br>
hawkular_1 | 01:00:44,750 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 01:00:48,554 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 3803 ms<br>
hawkular_1 | 03:00:44,761 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 03:00:48,659 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 3898 ms<br>
hawkular_1 | 05:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 05:00:49,134 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 4385 ms<br>
hawkular_1 | 07:00:44,755 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 07:00:49,831 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 5076 ms<br>
hawkular_1 | 09:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 09:00:49,508 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 4757 ms</p>
<p dir="auto">Now at 11:20 (9:20 for above logs) I started 5
more feeds (with a 2min sleep between starts)</p>
<p dir="auto">hawkular_1 | 11:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 11:00:49,751 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 5002 ms<br>
hawkular_1 | 13:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 13:00:56,594 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 11845 ms<br>
hawkular_1 | 15:00:44,754 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 15:00:53,985 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 9231 ms</p>
<p dir="auto">And another 5 starting at 17:12 (15:00 in above
logs timezone)</p>
<p dir="auto">hawkular_1 | 17:00:44,768 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 17:00:57,824 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 13056 ms</p>
<p dir="auto">And another 5 starting at 19:57 (17:57 in above
log timezone )</p>
<p dir="auto">hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
hawkular_1 | 19:01:24,401 WARN
org.apache.activemq.artemis.ra AMQ152007: Thread
Thread[TearDown/ActiveMQActivation,5,] could not be finished<br>
hawkular_1 | 19:01:40,918 WARN
org.apache.activemq.artemis.ra AMQ152007: Thread
Thread[TearDown/ActiveMQActivation,5,default-threads] could
not be finished<br>
hawkular_1 | 19:02:04,619 WARN
org.apache.activemq.artemis.ra AMQ152007: Thread
Thread[TearDown/ActiveMQActivation,5,default-threads] could
not be finished<br>
hawkular_1 | 19:02:22,423 WARN
org.apache.activemq.artemis.ra AMQ152007: Thread
Thread[TearDown/ActiveMQActivation,5,default-threads] could
not be finished<br>
hawkular_1 | 19:02:30,247 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 105495 ms</p>
<p dir="auto">This took almost 2 minutes, where the server was
non-responsive</p>
<p dir="auto">21:00:44,753 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution<br>
21:01:06,520 INFO
org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 21767 ms<br>
23:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting
execution</p>
<p dir="auto">And here it went south and the server more or
less died with OOME exceptions (it is still responding to
queries, and potentially even ingesting new data, but the
scheduler seems not to run anymore.<br>
I can imagine that once the OOME happened, the scheduler
thread died, freeing up memory that the compression job held
and which then allowed the server itself to continue. But it
certainly is in an unusable state.</p>
<p dir="auto">This is the final "one day" with "max" values
plotted:<br>
<img src="cid:part1.F91CA1EE.E86D0719@redhat.com" alt=""
title="Bildschirmfoto 2016-10-11 um 08.21.53.png"><br>
The peaks at compression time for heap-max (green line) are
clearly visible.</p>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
hawkular-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:hawkular-dev@lists.jboss.org">hawkular-dev@lists.jboss.org</a>
<a class="moz-txt-link-freetext" href="https://lists.jboss.org/mailman/listinfo/hawkular-dev">https://lists.jboss.org/mailman/listinfo/hawkular-dev</a>
</pre>
</blockquote>
<br>
</body>
</html>