[
https://issues.jboss.org/browse/WFLY-12167?page=com.atlassian.jira.plugin...
]
Jeff Mesnil commented on WFLY-12167:
------------------------------------
I investigated it a bit and the issue seems to be related to the clustering component.
When a /metrics HTTP request is queried, the metrics subsystem will invoke the
:read-attribute for the clustering resources (e.g. thread-pool-min-threads on the
channel=ee/protocol=UDP resource).
Since the returned value is undefined, the metrics subsystem will ignore this attribute.
However comparing heap dumps between /metrics queries reveals a significant increase of
ServiceName and ServiceRegistrationImpl instances (that are not GCed).
It seems the issue is related to
https://github.com/wildfly/wildfly/blob/master/clustering/service/src/mai...
(as the service names are identified with UUID).
It seems that the ServiceRegistrationImpl created by the one-off service is not been
properly removed from the Map in
org.jboss.msc.service.ServiceContainerImpl#getOrCreateRegistration
[~pferraro] I'm assigning this issue to you so that you can double check if the code
in ServiceSupplier is correct or if the leak comes below from MSC.
[~brian.stansberry] [~jamezp] I'm raising the priority to blocker as this issue will
eventually lead to memory exhaustion when the app server metrics are queried.
Depending on the time scale, we also have a workaround:
* explicitly state the subsystems that expose metrics (now it is set to the * wildcard to
expose metrics from all subsystems) so that groups metrics are not queried at all
Memory leak in metrics in standalone-ha configuration
-----------------------------------------------------
Key: WFLY-12167
URL:
https://issues.jboss.org/browse/WFLY-12167
Project: WildFly
Issue Type: Bug
Components: MP Metrics
Affects Versions: 16.0.0.Final
Reporter: Bernd Stolle
Assignee: Jeff Mesnil
Priority: Major
Labels: memoryleak
Attachments: Screenshot 2019-06-06 at 11.07.00.png
When started in standalone HA configuration every request to the recently added metrics
endpoint ({{<management-if>:9990/metrics}}) lead to an increase in memory
consumption until the JVM is slowed down significantly by GC to a point where even the
requests to {{/health}} fail within a reasonable timeout (2s) and untlimately lead to
OOM.
The same issue does not occur when WildFly is started in the default standalone
configuration (non HA).
I can provide a (compressed) heap dump if required.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)