[JBoss JIRA] (ISPN-6799) OOB thread pool fills with threads trying to send remote get responses
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6799?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6799:
-------------------------------
Fix Version/s: 8.2.5.Final
> OOB thread pool fills with threads trying to send remote get responses
> ----------------------------------------------------------------------
>
> Key: ISPN-6799
> URL: https://issues.jboss.org/browse/ISPN-6799
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha2, 8.2.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.0.0.Alpha3, 8.2.5.Final
>
>
> Note: This is a scenario that happens in the stress tests, with 4 nodes in dist mode, and 200+ threads per node doing only reads. I have not been able to reproduce it locally, even with a much lower OOB thread pool size and UFC.max_credits.
> We don't use the {{NO_FC}} flag, so threads sending both requests and responses can block in UFC/MFC. Remote gets are executed directly on the OOB thread, so when we run out of credits for one node, the OOB pool can quickly become full with threads waiting to send a remote get response to that node.
> While we can't send responses to that node, we won't send credits to it, either, as credits are only sent *after* the message has been processed by the application. That means OOB threads on all nodes will start blocking, trying to send remote get responses to us.
> This is made a worse by our staggering of remote gets. As remote get responses block, the stagger timeout kicks in and we send even more remote gets, making it even harder for the system to recover.
> UFC/MFC can send a {{CREDIT_REQUEST}} message to ask for more credits. The {{REPLENISH}} messages are handled on JGroups' internal thread pool, so they are not blocked. However, the CREDIT_REQUEST can be sent at most once every {{UFC.max_block_time}} ms, so they can't be relied on to provide enough credits. With the default settings, the throughput would be {{max_credits / max_block_time == 2mb / 0.5s == 4mb/s}}, which is really small compared to regular throughput.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 1 month
[JBoss JIRA] (ISPN-6799) OOB thread pool fills with threads trying to send remote get responses
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6799?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-6799:
-------------------------------
Git Pull Request: https://github.com/infinispan/infinispan/pull/4441, https://github.com/infinispan/infinispan/pull/4664 (was: https://github.com/infinispan/infinispan/pull/4441)
> OOB thread pool fills with threads trying to send remote get responses
> ----------------------------------------------------------------------
>
> Key: ISPN-6799
> URL: https://issues.jboss.org/browse/ISPN-6799
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha2, 8.2.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.0.0.Alpha3, 8.2.5.Final
>
>
> Note: This is a scenario that happens in the stress tests, with 4 nodes in dist mode, and 200+ threads per node doing only reads. I have not been able to reproduce it locally, even with a much lower OOB thread pool size and UFC.max_credits.
> We don't use the {{NO_FC}} flag, so threads sending both requests and responses can block in UFC/MFC. Remote gets are executed directly on the OOB thread, so when we run out of credits for one node, the OOB pool can quickly become full with threads waiting to send a remote get response to that node.
> While we can't send responses to that node, we won't send credits to it, either, as credits are only sent *after* the message has been processed by the application. That means OOB threads on all nodes will start blocking, trying to send remote get responses to us.
> This is made a worse by our staggering of remote gets. As remote get responses block, the stagger timeout kicks in and we send even more remote gets, making it even harder for the system to recover.
> UFC/MFC can send a {{CREDIT_REQUEST}} message to ask for more credits. The {{REPLENISH}} messages are handled on JGroups' internal thread pool, so they are not blocked. However, the CREDIT_REQUEST can be sent at most once every {{UFC.max_block_time}} ms, so they can't be relied on to provide enough credits. With the default settings, the throughput would be {{max_credits / max_block_time == 2mb / 0.5s == 4mb/s}}, which is really small compared to regular throughput.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 1 month
[JBoss JIRA] (ISPN-7101) Backports for 8.2.5.Final
by Radoslav Husar (JIRA)
[ https://issues.jboss.org/browse/ISPN-7101?page=com.atlassian.jira.plugin.... ]
Radoslav Husar edited comment on ISPN-7101 at 11/14/16 11:05 AM:
-----------------------------------------------------------------
Adding ISPN-6799 to the list. Otherwise everything else on this list is ready in 8.2.x branch so far.
was (Author: rhusar):
Adding ISPN-6799 to the list.
> Backports for 8.2.5.Final
> --------------------------
>
> Key: ISPN-7101
> URL: https://issues.jboss.org/browse/ISPN-7101
> Project: Infinispan
> Issue Type: Task
> Components: Core
> Affects Versions: 8.2.4.Final
> Reporter: Radoslav Husar
> Assignee: Tristan Tarrant
> Fix For: 8.2.5.Final
>
>
> Tracking Jira for issues to be backported to 8.2.5.Final.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 1 month
[JBoss JIRA] (ISPN-6925) Race condition in staggered gets
by Radoslav Husar (JIRA)
[ https://issues.jboss.org/browse/ISPN-6925?page=com.atlassian.jira.plugin.... ]
Radoslav Husar updated ISPN-6925:
---------------------------------
Fix Version/s: 8.2.5.Final
> Race condition in staggered gets
> --------------------------------
>
> Key: ISPN-6925
> URL: https://issues.jboss.org/browse/ISPN-6925
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha3, 8.2.3.Final
> Reporter: Radim Vansa
> Assignee: Radim Vansa
> Priority: Critical
> Fix For: 9.0.0.Beta1, 8.2.5.Final
>
> Attachments: server.log.node1, server.log.node2, server.log.node3
>
>
> There's a race condition in {{CommandAwareRpcDispatcher}}, as we do staggered gets. When the {{RspList}} is prepared, and then in {{processCallsStaggered$lambda}} the {{Rsp}} is filled in - both of them can set is as received but later see that the other response was not received yet, because there's no memory barrieri n between the {{setValue}}/{{setException}} and checking {{wasReceived}}.
> The race above happens when two responses come but none of them is accepted by the filter, but there's a second one in JGroupsTransport when the first response is accepted but then comes another one. In {{JGroupsTransport.invokeRemotelyAsync}} in the lambda handling {{rspListFuture.thenApply}} we may see another thread concurrently modifying the rsps; e.g. in {{checkRsp}} you find out that the concurrently written response was received and it's not an exception according to flags, but the value will be null, so you return null while you can have valid response in the other {{Rsp}}.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 1 month
[JBoss JIRA] (ISPN-6445) PreloadingWithWriteBehindTest.testIfCanLoadKeysConcurrently random failures
by Radoslav Husar (JIRA)
[ https://issues.jboss.org/browse/ISPN-6445?page=com.atlassian.jira.plugin.... ]
Radoslav Husar updated ISPN-6445:
---------------------------------
Fix Version/s: 8.2.5.Final
> PreloadingWithWriteBehindTest.testIfCanLoadKeysConcurrently random failures
> ---------------------------------------------------------------------------
>
> Key: ISPN-6445
> URL: https://issues.jboss.org/browse/ISPN-6445
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Affects Versions: 8.2.0.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 9.0.0.Alpha1, 9.0.0.Final, 8.2.5.Final
>
>
> The test assumes that the {{AdvancedAsyncCacheLoader.size()}} is accurate, but the implementation actually ignores the in-memory changes and just delegates to the real loader's {{size()}}.
> Also, the method doesn't really belong in this test, because it doesn't use preloading.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 1 month
[JBoss JIRA] (ISPN-6911) Add a MANUAL strategy to eviction
by Radoslav Husar (JIRA)
[ https://issues.jboss.org/browse/ISPN-6911?page=com.atlassian.jira.plugin.... ]
Radoslav Husar updated ISPN-6911:
---------------------------------
Fix Version/s: 8.2.5.Final
> Add a MANUAL strategy to eviction
> ---------------------------------
>
> Key: ISPN-6911
> URL: https://issues.jboss.org/browse/ISPN-6911
> Project: Infinispan
> Issue Type: Enhancement
> Reporter: Tristan Tarrant
> Assignee: Tristan Tarrant
> Fix For: 9.0.0.Alpha4, 9.0.0.Final, 8.2.5.Final
>
>
> Currently the eviction configuration validation logs a warning when passivation is enabled without an eviction strategy (common with WildFly, where eviction is performed manually).
> To silence an otherwise misleading error message when the user is fully aware of the behaviour, we can introduce a MANUAL strategy
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 1 month
[JBoss JIRA] (ISPN-6918) Monitoring support for server running in Standalone mode
by Vladimir Blagojevic (JIRA)
[ https://issues.jboss.org/browse/ISPN-6918?page=com.atlassian.jira.plugin.... ]
Vladimir Blagojevic updated ISPN-6918:
--------------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 9.0.0.Beta1
Resolution: Done
> Monitoring support for server running in Standalone mode
> ---------------------------------------------------------
>
> Key: ISPN-6918
> URL: https://issues.jboss.org/browse/ISPN-6918
> Project: Infinispan
> Issue Type: Feature Request
> Components: Console
> Reporter: Pedro Zapata
> Assignee: Vladimir Blagojevic
> Fix For: 9.0.0.Beta1
>
>
> Implement monitoring of Infinispan servers running in standalone mode. No cluster management.
> - Detect standalone mode
> - Disable UI bits that don't make sense in standalone mode - i.e. change configuration
> - Query the appropiate server's DMR, instead of the coordinator
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 1 month