[
https://issues.redhat.com/browse/WFLY-14372?page=com.atlassian.jira.plugi...
]
Jason Lee commented on WFLY-14372:
----------------------------------
Turns out, I was mistaken. :) With improved debugging logging, I can see where we are,
indeed, scanning things at least twice, and sometimes 4 times. When a MP Metrics app is
installed and the server (re)started/reloaded, we can see the two DPUs (WF metrics and MP
Metrics) each scan the app. The two subsystem add processes also scan the system,
resulting in the duplicate scans. The "four" part comes in with the application:
there are two scans from the DPU, and two more attempts from the subsystems, though these
additional scans are shallow and do not recurse into the application.
To fix the situation, we should make sure that the WF subsystem add and the WF metrics
DPUs do not scan when the MP Metrics module is added, perhaps using the Capabilities
system to identify the scenario. We may also need to prevent the subsystem add scans skip
deployments, allowing the DPUs to handle that. I'll need to verify that with Brian,
though.
I'm converting this ticket to a Bug to track the changes.
Multiple metrics collections
----------------------------
Key: WFLY-14372
URL:
https://issues.redhat.com/browse/WFLY-14372
Project: WildFly
Issue Type: Task
Components: MP Metrics
Reporter: Brian Stansberry
Assignee: Jason Lee
Priority: Critical
See discussion on
https://github.com/wildfly/wildfly/pull/13871
Do we have MetricsCollector collecting the container metrics multiple times?
I haven't thought hard about this, but doesn't the Stage.VERIFY collection in
MetricsSubsystemAdd end up re-collecting all the deployment=* subtree metrics already
collected in Stage.RUNTIME via DeploymentMetricProcessor/DeploymentMetricService? It walks
the whole resource tree from the root.
If the MP Metrics subsystem is installed, isn't MicroProfileMetricsSubsystemAdd and
that subsystem's DeploymentMetricProcessor/DeploymentMetricService also collecting the
same set of metrics?
I'm filing this as a Task because maybe all that's needed is to investigate and
answer those questions reporting that all is well. But if all isn't well this should
converted to a Bug.
Also, as discussed on PR #13871,
https://github.com/wildfly/wildfly/blob/22.0.0.Final/metrics/src/main/jav...
is probably not the best idiom given the code is iterating over runtime-only resources,
where the cost of hasChildren can be high.
--
This message was sent by Atlassian Jira
(v8.13.1#813001)