[JBoss JIRA] (JGRP-1957) S3_PING: Nodes never removed from .list file
by Nick Sawadsky (JIRA)
[ https://issues.jboss.org/browse/JGRP-1957?page=com.atlassian.jira.plugin.... ]
Nick Sawadsky updated JGRP-1957:
--------------------------------
Description:
I'm not 100% sure, but it seems like there might be a defect here.
I'm using TCP, S3_PING, and MERGE3.
I've set logical_addr_cache_max_size to 2 for testing purposes, although I don't think the value of this setting affects my test results.
I start a single node, node A. Then I start a second node, node B.
I then repeatedly shutdown and restart node B.
Each time node B starts, a new row is added to the .list file stored in S3.
But even if I continue this process for 15 minutes, old rows are never removed from the .list file, so it continues to grow in size.
I've read the docs and mailing list threads, so I'm aware that the list is not immediately updated as soon as a member leaves. But I was expecting that when a view change occurs, nodes no longer in the view would be marked for removal (line 2193 of TP.java) and then after the logical_addr_cache_expiration has been reached and the reaper kicks in, once a new node joins, the expired cache entries would be purged from the file.
I dug in to the code a bit, and what seems to be happening is that the MERGE3 protocol periodically generates a FIND_MBRS event. S3_PING retrieves the membership from the .list file, which includes expired nodes. And then all of these members are re-added to the logical address cache (line 157 of S3_PING.java, line 533 of Discovery.java, line 2263 if TP.java).
So expired nodes are continually re-added to the logical address cache, preventing them from ever being reaped.
was:
I'm not 100% sure, but it seems like there might be a defect here.
I'm using TCP, S3_PING, and MERGE3.
I've set logical_addr_cache_max_size to 2 for testing purposes.
I start a single node, node A. Then I start a second node, node B.
I then repeatedly shutdown and restart node B.
Each time node B starts, a new row is added to the .list file stored in S3.
But even if I continue this process for 15 minutes, old rows are never removed from the .list file, so it continues to grow in size.
I've read the docs and mailing list threads, so I'm aware that the list is not immediately updated as soon as a member leaves. But I was expecting that when a view change occurs, nodes no longer in the view would be marked for removal (line 2193 of TP.java) and then after the logical_addr_cache_expiration has been reached and the reaper kicks in, once a new node joins, the expired cache entries would be purged from the file.
I dug in to the code a bit, and what seems to be happening is that the MERGE3 protocol periodically generates a FIND_MBRS event. S3_PING retrieves the membership from the .list file, which includes expired nodes. And then all of these members are re-added to the logical address cache (line 157 of S3_PING.java, line 533 of Discovery.java, line 2263 if TP.java).
So expired nodes are continually re-added to the logical address cache, preventing them from ever being reaped.
> S3_PING: Nodes never removed from .list file
> --------------------------------------------
>
> Key: JGRP-1957
> URL: https://issues.jboss.org/browse/JGRP-1957
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.4
> Environment: JGroups client running on Mac OS X - Yosemite
> JDK 1.7.71
> Reporter: Nick Sawadsky
> Assignee: Bela Ban
> Priority: Minor
>
> I'm not 100% sure, but it seems like there might be a defect here.
> I'm using TCP, S3_PING, and MERGE3.
> I've set logical_addr_cache_max_size to 2 for testing purposes, although I don't think the value of this setting affects my test results.
> I start a single node, node A. Then I start a second node, node B.
> I then repeatedly shutdown and restart node B.
> Each time node B starts, a new row is added to the .list file stored in S3.
> But even if I continue this process for 15 minutes, old rows are never removed from the .list file, so it continues to grow in size.
> I've read the docs and mailing list threads, so I'm aware that the list is not immediately updated as soon as a member leaves. But I was expecting that when a view change occurs, nodes no longer in the view would be marked for removal (line 2193 of TP.java) and then after the logical_addr_cache_expiration has been reached and the reaper kicks in, once a new node joins, the expired cache entries would be purged from the file.
> I dug in to the code a bit, and what seems to be happening is that the MERGE3 protocol periodically generates a FIND_MBRS event. S3_PING retrieves the membership from the .list file, which includes expired nodes. And then all of these members are re-added to the logical address cache (line 157 of S3_PING.java, line 533 of Discovery.java, line 2263 if TP.java).
> So expired nodes are continually re-added to the logical address cache, preventing them from ever being reaped.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 4 months
[JBoss JIRA] (JGRP-1957) S3_PING: Nodes never removed from .list file
by Nick Sawadsky (JIRA)
[ https://issues.jboss.org/browse/JGRP-1957?page=com.atlassian.jira.plugin.... ]
Nick Sawadsky updated JGRP-1957:
--------------------------------
Workaround Description:
- reduce the logical_addr_cache_expiration to 1 second
- reduce the logical_addr_cache_reaper_interval to 10 seconds
- increase the min_interval and max_interval for MERGE3 to 30 and 60 seconds, respectively
- set remove_all_files_on_view_change to true
With these settings, the expired nodes do seem to get removed from the file as expected. The drawback is that these settings will result in increased frequency of updates to the backend store.
The effect of these settings is to ensure that cache entries expire and are reaped more quickly, and that the info writer thread kicks in to write the updated cache before the MERGE3 protocol is able to send a new FIND_MBRS event.
was:
- reduce the logical_addr_cache_expiration to 1 second
- reduce the logical_addr_cache_reaper_interval to 10 seconds
- increase the min_interval and max_interval for MERGE3 to 30 and 60 seconds, respectively
- set remove_all_files_on_view_change to true
- set info_writer_sleep_time to 20 seconds
With these settings, the expired nodes do seem to get removed from the file as expected. The drawback is that these settings will result in increased frequency of updates to the backend store.
The effect of these settings is to ensure that cache entries expire and are reaped more quickly, and that the info writer thread kicks in to write the updated cache before the MERGE3 protocol is able to send a new FIND_MBRS event.
> S3_PING: Nodes never removed from .list file
> --------------------------------------------
>
> Key: JGRP-1957
> URL: https://issues.jboss.org/browse/JGRP-1957
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.4
> Environment: JGroups client running on Mac OS X - Yosemite
> JDK 1.7.71
> Reporter: Nick Sawadsky
> Assignee: Bela Ban
> Priority: Minor
>
> I'm not 100% sure, but it seems like there might be a defect here.
> I'm using TCP, S3_PING, and MERGE3.
> I've set logical_addr_cache_max_size to 2 for testing purposes.
> I start a single node, node A. Then I start a second node, node B.
> I then repeatedly shutdown and restart node B.
> Each time node B starts, a new row is added to the .list file stored in S3.
> But even if I continue this process for 15 minutes, old rows are never removed from the .list file, so it continues to grow in size.
> I've read the docs and mailing list threads, so I'm aware that the list is not immediately updated as soon as a member leaves. But I was expecting that when a view change occurs, nodes no longer in the view would be marked for removal (line 2193 of TP.java) and then after the logical_addr_cache_expiration has been reached and the reaper kicks in, once a new node joins, the expired cache entries would be purged from the file.
> I dug in to the code a bit, and what seems to be happening is that the MERGE3 protocol periodically generates a FIND_MBRS event. S3_PING retrieves the membership from the .list file, which includes expired nodes. And then all of these members are re-added to the logical address cache (line 157 of S3_PING.java, line 533 of Discovery.java, line 2263 if TP.java).
> So expired nodes are continually re-added to the logical address cache, preventing them from ever being reaped.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 4 months
[JBoss JIRA] (JGRP-1957) S3_PING: Nodes never removed from .list file
by Nick Sawadsky (JIRA)
Nick Sawadsky created JGRP-1957:
-----------------------------------
Summary: S3_PING: Nodes never removed from .list file
Key: JGRP-1957
URL: https://issues.jboss.org/browse/JGRP-1957
Project: JGroups
Issue Type: Bug
Affects Versions: 3.6.4
Environment: JGroups client running on Mac OS X - Yosemite
JDK 1.7.71
Reporter: Nick Sawadsky
Assignee: Bela Ban
Priority: Minor
I'm not 100% sure, but it seems like there might be a defect here.
I'm using TCP, S3_PING, and MERGE3.
I've set logical_addr_cache_max_size to 2 for testing purposes.
I start a single node, node A. Then I start a second node, node B.
I then repeatedly shutdown and restart node B.
Each time node B starts, a new row is added to the .list file stored in S3.
But even if I continue this process for 15 minutes, old rows are never removed from the .list file, so it continues to grow in size.
I've read the docs and mailing list threads, so I'm aware that the list is not immediately updated as soon as a member leaves. But I was expecting that when a view change occurs, nodes no longer in the view would be marked for removal (line 2193 of TP.java) and then after the logical_addr_cache_expiration has been reached and the reaper kicks in, once a new node joins, the expired cache entries would be purged from the file.
I dug in to the code a bit, and what seems to be happening is that the MERGE3 protocol periodically generates a FIND_MBRS event. S3_PING retrieves the membership from the .list file, which includes expired nodes. And then all of these members are re-added to the logical address cache (line 157 of S3_PING.java, line 533 of Discovery.java, line 2263 if TP.java).
So expired nodes are continually re-added to the logical address cache, preventing them from ever being reaped.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 4 months
[JBoss JIRA] (WFLY-5241) Update messaging subsystem descriptions to properly report path cardinality
by Brian Stansberry (JIRA)
Brian Stansberry created WFLY-5241:
--------------------------------------
Summary: Update messaging subsystem descriptions to properly report path cardinality
Key: WFLY-5241
URL: https://issues.jboss.org/browse/WFLY-5241
Project: WildFly
Issue Type: Task
Components: JMS
Reporter: Brian Stansberry
Assignee: Brian Stansberry
Fix For: 10.0.0.CR1
https://docs.jboss.org/author/display/WFLY10/Description+of+the+Managemen... has been updated to more intelligently report cardinality information about resource registrations, with each reg reporting its own cardinality instead of having the parent report data about the "key" portion of potentially unrelated children.
The two messaging subsystems are actually trying to use the old way for "path" children, so this JIRA is to move to the new way.
Instead of the parent resource reporting there must be 4 and no more than 4 children with key "path", now each of the 4 valid path children types will report that the min-occurs of that type is 1. (The max-occurs is also 1, but that will not be reported as that, per the spec, is the default for a 'singleton' type, and each of the 4 path types is a singleton.)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 4 months
[JBoss JIRA] (WFCORE-621) Support legacy slaves when invoking wildcard reads in a domain
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-621?page=com.atlassian.jira.plugin... ]
Brian Stansberry updated WFCORE-621:
------------------------------------
Description:
This is a follow-on to WFCORE-282.
The WFCORE-282 solution will not work for requests with address patterns {code}/host=*/server=*{code} or {code}/host=somename/server=*{code} if the host named 'somename' is running a WFCORE version prior to 1.0.0.CR1 (or whatever release first has WFCORE-282 introduced.)
In particular, it won't work with slaves running EAP 6.x.
The problem is with either of those address patterns the DC will send a request addressed to {code}/host=somename/server=*{code} to the slave, and the slave will not be able to handle it, as it won't have the WFCORE-282 logic that lets it identify the relevant servers and send requests on to them.
Potentially this could be fixed by having the DC detect these patterns and not call {code}/host=somename/server=*{code}, instead adding a step to read the server child names from /host=somename and then call /host=somename/server=a, /host=somename/server=b, etc.
was:
This is a follow-on to WFCORE-282.
The WFCORE-282 will not work for requests with address patterns /host=*/server=* or /host=somename/server=* if the host named 'somename' is running a WFCORE version prior to 1.0.0.CR1 (or whatever release first has WFCORE-282 introduced.)
In particular, it won't work with slaves running EAP 6.x.
The problem is with either of those address patterns the DC will send a request addressed to /host=somename/server=* to the slave, and the slave will not be able to handle it, as it won't have the WFCORE-282 logic that lets it identify the relevant servers and send requests on to them.
Potentially this could be fixed by having the DC detect these patterns and not call /host=somename/server=*, instead adding a step to read the server child names from /host=somename and then call /host=somename/server=a, /host=somename/server=b, etc.
> Support legacy slaves when invoking wildcard reads in a domain
> --------------------------------------------------------------
>
> Key: WFCORE-621
> URL: https://issues.jboss.org/browse/WFCORE-621
> Project: WildFly Core
> Issue Type: Enhancement
> Components: Domain Management
> Reporter: Brian Stansberry
> Priority: Critical
> Fix For: 2.0.0.CR1
>
>
> This is a follow-on to WFCORE-282.
> The WFCORE-282 solution will not work for requests with address patterns {code}/host=*/server=*{code} or {code}/host=somename/server=*{code} if the host named 'somename' is running a WFCORE version prior to 1.0.0.CR1 (or whatever release first has WFCORE-282 introduced.)
> In particular, it won't work with slaves running EAP 6.x.
> The problem is with either of those address patterns the DC will send a request addressed to {code}/host=somename/server=*{code} to the slave, and the slave will not be able to handle it, as it won't have the WFCORE-282 logic that lets it identify the relevant servers and send requests on to them.
> Potentially this could be fixed by having the DC detect these patterns and not call {code}/host=somename/server=*{code}, instead adding a step to read the server child names from /host=somename and then call /host=somename/server=a, /host=somename/server=b, etc.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 4 months
[JBoss JIRA] (WFCORE-395) Improve reporting during deployment hang
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-395?page=com.atlassian.jira.plugin... ]
Brian Stansberry updated WFCORE-395:
------------------------------------
Fix Version/s: (was: 2.0.0.CR1)
> Improve reporting during deployment hang
> ----------------------------------------
>
> Key: WFCORE-395
> URL: https://issues.jboss.org/browse/WFCORE-395
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Environment: http://java.net/jira/browse/EJB_SPEC-60
> java version "1.7.0_09"
> OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-0ubuntu1~12.10.1)
> OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
> Ubuntu 12.10
> Reporter: Carlo de Wolf
> Assignee: Brian Stansberry
> Attachments: deployment-hang-20130205.txt, server.log
>
>
> Management Thread waits indefinitely for, what seems to be, a finished operation.
> {noformat}
> "management-handler-thread - 2" prio=10 tid=0x00007fa1380d0000 nid=0x7683 in Object.wait() [0x00007fa136deb000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000e04ae778> (a org.jboss.as.controller.ContainerStateMonitor)
> at java.lang.Object.wait(Object.java:503)
> at org.jboss.as.controller.ContainerStateMonitor.awaitContainerStateChangeReport(ContainerStateMonitor.java:158)
> - locked <0x00000000e04ae778> (a org.jboss.as.controller.ContainerStateMonitor)
> at org.jboss.as.controller.ModelControllerImpl.awaitContainerStateChangeReport(ModelControllerImpl.java:464)
> at org.jboss.as.controller.OperationContextImpl.awaitModelControllerContainerMonitor(OperationContextImpl.java:148)
> at org.jboss.as.controller.AbstractOperationContext.doCompleteStep(AbstractOperationContext.java:299)
> at org.jboss.as.controller.AbstractOperationContext.completeStepInternal(AbstractOperationContext.java:229)
> at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:224)
> at org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelControllerImpl.java:142)
> at org.jboss.as.controller.ModelControllerImpl.execute(ModelControllerImpl.java:112)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler.doExecute(ModelControllerClientOperationHandler.java:139)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1.execute(ModelControllerClientOperationHandler.java:108)
> at org.jboss.as.protocol.mgmt.AbstractMessageHandler$2$1.doExecute(AbstractMessageHandler.java:296)
> at org.jboss.as.protocol.mgmt.AbstractMessageHandler$AsyncTaskRunner.run(AbstractMessageHandler.java:518)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> at org.jboss.threads.JBossThread.run(JBossThread.java:122)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 4 months
[JBoss JIRA] (WFCORE-395) Improve reporting during deployment hang
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-395?page=com.atlassian.jira.plugin... ]
Brian Stansberry reassigned WFCORE-395:
---------------------------------------
Assignee: (was: Brian Stansberry)
> Improve reporting during deployment hang
> ----------------------------------------
>
> Key: WFCORE-395
> URL: https://issues.jboss.org/browse/WFCORE-395
> Project: WildFly Core
> Issue Type: Feature Request
> Components: Domain Management
> Environment: http://java.net/jira/browse/EJB_SPEC-60
> java version "1.7.0_09"
> OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-0ubuntu1~12.10.1)
> OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
> Ubuntu 12.10
> Reporter: Carlo de Wolf
> Attachments: deployment-hang-20130205.txt, server.log
>
>
> Management Thread waits indefinitely for, what seems to be, a finished operation.
> {noformat}
> "management-handler-thread - 2" prio=10 tid=0x00007fa1380d0000 nid=0x7683 in Object.wait() [0x00007fa136deb000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000e04ae778> (a org.jboss.as.controller.ContainerStateMonitor)
> at java.lang.Object.wait(Object.java:503)
> at org.jboss.as.controller.ContainerStateMonitor.awaitContainerStateChangeReport(ContainerStateMonitor.java:158)
> - locked <0x00000000e04ae778> (a org.jboss.as.controller.ContainerStateMonitor)
> at org.jboss.as.controller.ModelControllerImpl.awaitContainerStateChangeReport(ModelControllerImpl.java:464)
> at org.jboss.as.controller.OperationContextImpl.awaitModelControllerContainerMonitor(OperationContextImpl.java:148)
> at org.jboss.as.controller.AbstractOperationContext.doCompleteStep(AbstractOperationContext.java:299)
> at org.jboss.as.controller.AbstractOperationContext.completeStepInternal(AbstractOperationContext.java:229)
> at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:224)
> at org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelControllerImpl.java:142)
> at org.jboss.as.controller.ModelControllerImpl.execute(ModelControllerImpl.java:112)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler.doExecute(ModelControllerClientOperationHandler.java:139)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1.execute(ModelControllerClientOperationHandler.java:108)
> at org.jboss.as.protocol.mgmt.AbstractMessageHandler$2$1.doExecute(AbstractMessageHandler.java:296)
> at org.jboss.as.protocol.mgmt.AbstractMessageHandler$AsyncTaskRunner.run(AbstractMessageHandler.java:518)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> at org.jboss.threads.JBossThread.run(JBossThread.java:122)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
9 years, 4 months