Hi,
There's on going discussions and more data coming from Heiko, but since
not everyone might be online, I'll fill in here as well..
Testing with the Hawkular-Metrics (not services), I can't repeat these.
I started by testing just the compression library and after compressing
some 9 billion datapoints, I was still running constant ~10MB heap usage.
So I started HWKMETRICS in Wildfly 10.0.0.Final and ran with our
load-test, gatling and clients=20, metrics=1000 and loops enough to keep
it running forever. The instance was modified to run with 20 minutes
interval of compression.
Monitoring with jconsole, I can't see any difference in memory usage
from first 5 minutes to the last 5 minutes (running for two hours now).
It's running between 147MB of memory and 400MB.
Requires more investigation then..
- Micke
On 10/11/2016 09:42 AM, Heiko W.Rupp wrote:
Hey,
tl;dr: we need to investigate heap usage - especially in the case of
compression
kicking in - it looks like there could be a memory leak. Compression
timing seems
mostly ok.
I originally wanted to see how more feeds influence the metrics
compression timing.
So I started the server with -Xmx512m as I did in all the weeks before
and pointed a
few feeds at the server - to see it crash with OOME shortly after the
compression started.
Now I restarted the server with -Xmx1024m and also
-XX:MaxMetaspaceSize=512m (up from 256m before) and am running the
server now with 1 feed for a day.
To be continued below ...
( Time is in GMT, which is 2h off of my local time)
hawkular_1 | 15:00:44,764 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 15:00:45,452 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 687 ms
hawkular_1 | 17:00:44,757 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 17:00:46,796 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 2039 ms
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 19:00:47,293 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 2541 ms
hawkular_1 | 21:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 21:00:46,267 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 1517 ms
hawkular_1 | 23:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 23:00:45,472 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 722 ms
hawkular_1 | 01:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 01:00:46,241 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 1492 ms
hawkular_1 | 03:00:44,747 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 03:00:45,780 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 1033 ms
hawkular_1 | 05:00:44,746 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 05:00:45,781 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 1034 ms
hawkular_1 | 07:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 07:00:46,386 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 1636 ms
hawkular_1 | 09:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 09:00:45,682 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 934 ms
hawkular_1 | 11:00:44,750 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 11:00:46,339 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 1589 ms
hawkular_1 | 13:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 13:00:45,880 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 1132 ms
Looking at the memory usage I see the memory usage there is often a
peak in heap usage around and after the compression - looking at the
accumulated GC-time also shows heavy GC activity. I guess the peak in
heap usage (and thus committed heap) comes from promoting objects from
young generation into old during the compression and then later after
compression is over, they are garbage collected, so that heap usage
goes down and the system is able to free some memory.
Starting at around 4:10pm (2pm in the above output) I am running with
10 extra feeds (hawkfly on an external box)
While it looks like there is a slowly growing non-heap, it was growing
a lot when the 10 extra feeds connected.
Also it looks like it is growing a bit more each time compression
kicks in ("non-heap")
The first compression run with 11 feeds did not take too long, the
next was
hawkular_1 | 17:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 17:00:50,277 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 5528 ms
Which used a lot more memory than before. Non-heap was able to reclaim
some memory though.
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 19:00:50,093 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 5341 ms
hawkular_1 | 21:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 21:00:49,465 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 4714 ms
hawkular_1 | 23:00:44,753 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 23:00:48,925 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 4171 ms
hawkular_1 | 01:00:44,750 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 01:00:48,554 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 3803 ms
hawkular_1 | 03:00:44,761 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 03:00:48,659 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 3898 ms
hawkular_1 | 05:00:44,748 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 05:00:49,134 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 4385 ms
hawkular_1 | 07:00:44,755 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 07:00:49,831 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 5076 ms
hawkular_1 | 09:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 09:00:49,508 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 4757 ms
Now at 11:20 (9:20 for above logs) I started 5 more feeds (with a 2min
sleep between starts)
hawkular_1 | 11:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 11:00:49,751 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 5002 ms
hawkular_1 | 13:00:44,749 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 13:00:56,594 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 11845 ms
hawkular_1 | 15:00:44,754 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 15:00:53,985 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 9231 ms
And another 5 starting at 17:12 (15:00 in above logs timezone)
hawkular_1 | 17:00:44,768 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 17:00:57,824 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 13056 ms
And another 5 starting at 19:57 (17:57 in above log timezone )
hawkular_1 | 19:00:44,751 INFO
org.hawkular.metrics.core.jobs.CompressData Starting execution
hawkular_1 | 19:01:24,401 WARN org.apache.activemq.artemis.ra
AMQ152007: Thread Thread[TearDown/ActiveMQActivation,5,] could not be
finished
hawkular_1 | 19:01:40,918 WARN org.apache.activemq.artemis.ra
AMQ152007: Thread
Thread[TearDown/ActiveMQActivation,5,default-threads] could not be
finished
hawkular_1 | 19:02:04,619 WARN org.apache.activemq.artemis.ra
AMQ152007: Thread
Thread[TearDown/ActiveMQActivation,5,default-threads] could not be
finished
hawkular_1 | 19:02:22,423 WARN org.apache.activemq.artemis.ra
AMQ152007: Thread
Thread[TearDown/ActiveMQActivation,5,default-threads] could not be
finished
hawkular_1 | 19:02:30,247 INFO
org.hawkular.metrics.core.jobs.CompressData Finished compressing data
in 105495 ms
This took almost 2 minutes, where the server was non-responsive
21:00:44,753 INFO org.hawkular.metrics.core.jobs.CompressData Starting
execution
21:01:06,520 INFO org.hawkular.metrics.core.jobs.CompressData Finished
compressing data in 21767 ms
23:00:44,751 INFO org.hawkular.metrics.core.jobs.CompressData Starting
execution
And here it went south and the server more or less died with OOME
exceptions (it is still responding to queries, and potentially even
ingesting new data, but the scheduler seems not to run anymore.
I can imagine that once the OOME happened, the scheduler thread died,
freeing up memory that the compression job held and which then allowed
the server itself to continue. But it certainly is in an unusable state.
This is the final "one day" with "max" values plotted:
The peaks at compression time for heap-max (green line) are clearly
visible.
_______________________________________________
hawkular-dev mailing list
hawkular-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hawkular-dev