[JBoss JIRA] (ISPN-6799) OOB thread pool fills with threads trying to send remote get responses
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-6799?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-6799:
------------------------------------
There are 2 ways we can prevent the OOB threads from blocking:
1. Use the NO_FC flag.
I have already tried using the NO_FC flag for all sync RPCs (and implicitly their responses as well), but throughput went down a lot - so it looks like flow control is helpful.
However, I have had promising results on my machine by setting the NO_FC flag only for RPC _responses_ instead. It could be a good middle ground, but we need to test it on a real network.
2. Execute the remote gets on the remote-executor thread pool.
Here too, the first approach wasn't good enough. I have tried moving all remote gets to the remote-executor pool, and performance was much worse.
But we can make that decision dynamically, based on the state of the OOB thread pool. I have had very good results on my machine by sending the remote get commands to the remote-executor pool when the OOB pool is at least 3/4 full. This doesn't work very well when the OOB pool has a queue: either the {{min_threads > 3/4 * max_threads}}, and the queue is never used, or {{min_threads <= 3/4 * max_threads}}, and we execute all remote gets on the OOB pool.
If this turns out to work well, we can try expanding it to other commands, in order to avoid a context switch and to improve latency.
> OOB thread pool fills with threads trying to send remote get responses
> ----------------------------------------------------------------------
>
> Key: ISPN-6799
> URL: https://issues.jboss.org/browse/ISPN-6799
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 9.0.0.Alpha2, 8.2.2.Final
> Reporter: Dan Berindei
> Fix For: 9.0.0.Alpha3
>
>
> Note: This is a scenario that happens in the stress tests, with 4 nodes in dist mode, and 200+ threads per node doing only reads. I have not been able to reproduce it locally, even with a much lower OOB thread pool size and UFC.max_credits.
> We don't use the {{NO_FC}} flag, so threads sending both requests and responses can block in UFC/MFC. Remote gets are executed directly on the OOB thread, so when we run out of credits for one node, the OOB pool can quickly become full with threads waiting to send a remote get response to that node.
> While we can't send responses to that node, we won't send credits to it, either, as credits are only sent *after* the message has been processed by the application. That means OOB threads on all nodes will start blocking, trying to send remote get responses to us.
> This is made a worse by our staggering of remote gets. As remote get responses block, the stagger timeout kicks in and we send even more remote gets, making it even harder for the system to recover.
> UFC/MFC can send a {{CREDIT_REQUES}}T message to ask for more credits. The {{REPLENISH}} messages are handled on JGroups' internal thread pool, so they are not blocked. However, the CREDIT_REQUEST can be sent at most once every {{UFC.max_block_time}} ms, so they can't be relied on to provide enough credits. With the default settings, the throughput would be {{max_credits / max_block_time == 2mb / 0.5s == 4mb/s}}, which is really small compared to regular throughput.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 6 months
[JBoss JIRA] (ISPN-6733) XML Serializer does not serialize attributes for org.infinispan.persistence.cluster.ClusterLoader
by Anna Manukyan (JIRA)
[ https://issues.jboss.org/browse/ISPN-6733?page=com.atlassian.jira.plugin.... ]
Anna Manukyan reassigned ISPN-6733:
-----------------------------------
Assignee: Anna Manukyan
> XML Serializer does not serialize attributes for org.infinispan.persistence.cluster.ClusterLoader
> -------------------------------------------------------------------------------------------------
>
> Key: ISPN-6733
> URL: https://issues.jboss.org/browse/ISPN-6733
> Project: Infinispan
> Issue Type: Bug
> Components: Configuration
> Reporter: Anna Manukyan
> Assignee: Anna Manukyan
>
> When the clusterLoader is configured using {{store}} tag, the specified attributes are not parsed to the new version xml.
> The following xml:
> {code}
> .................................................
> <namedCache name="withClusterLoader1">
> <persistence>
> <store class="org.infinispan.persistence.cluster.ClusterLoader" preload="true" fetchPersistentState="true" ignoreModifications="true" purgeOnStartup="true" shared="true">
> <properties>
> <property name="remoteCallTimeout" value="15000" />
> </properties>
> </store>
> </persistence>
> </namedCache>
> .....................
> {code}
> is parsed to:
> {code}
> ...............................
> <local-cache name="withClusterLoader1" statistics="false">
> <persistence>
> <cluster-loader remote-timeout="15000">
> <property name="remoteCallTimeout">
> 15000
> </property>
> </cluster-loader>
> </persistence>
> </local-cache>
> .............................
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 6 months
[JBoss JIRA] (ISPN-6733) XML Serializer does not serialize attributes for org.infinispan.persistence.cluster.ClusterLoader
by Anna Manukyan (JIRA)
[ https://issues.jboss.org/browse/ISPN-6733?page=com.atlassian.jira.plugin.... ]
Anna Manukyan updated ISPN-6733:
--------------------------------
Status: Open (was: New)
> XML Serializer does not serialize attributes for org.infinispan.persistence.cluster.ClusterLoader
> -------------------------------------------------------------------------------------------------
>
> Key: ISPN-6733
> URL: https://issues.jboss.org/browse/ISPN-6733
> Project: Infinispan
> Issue Type: Bug
> Components: Configuration
> Reporter: Anna Manukyan
> Assignee: Anna Manukyan
>
> When the clusterLoader is configured using {{store}} tag, the specified attributes are not parsed to the new version xml.
> The following xml:
> {code}
> .................................................
> <namedCache name="withClusterLoader1">
> <persistence>
> <store class="org.infinispan.persistence.cluster.ClusterLoader" preload="true" fetchPersistentState="true" ignoreModifications="true" purgeOnStartup="true" shared="true">
> <properties>
> <property name="remoteCallTimeout" value="15000" />
> </properties>
> </store>
> </persistence>
> </namedCache>
> .....................
> {code}
> is parsed to:
> {code}
> ...............................
> <local-cache name="withClusterLoader1" statistics="false">
> <persistence>
> <cluster-loader remote-timeout="15000">
> <property name="remoteCallTimeout">
> 15000
> </property>
> </cluster-loader>
> </persistence>
> </local-cache>
> .............................
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 6 months
[JBoss JIRA] (ISPN-6799) OOB thread pool fills with threads trying to send remote get responses
by Dan Berindei (JIRA)
Dan Berindei created ISPN-6799:
----------------------------------
Summary: OOB thread pool fills with threads trying to send remote get responses
Key: ISPN-6799
URL: https://issues.jboss.org/browse/ISPN-6799
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 8.2.2.Final, 9.0.0.Alpha2
Reporter: Dan Berindei
Fix For: 9.0.0.Alpha3
Note: This is a scenario that happens in the stress tests, with 4 nodes in dist mode, and 200+ threads per node doing only reads. I have not been able to reproduce it locally, even with a much lower OOB thread pool size and UFC.max_credits.
We don't use the {{NO_FC}} flag, so threads sending both requests and responses can block in UFC/MFC. Remote gets are executed directly on the OOB thread, so when we run out of credits for one node, the OOB pool can quickly become full with threads waiting to send a remote get response to that node.
While we can't send responses to that node, we won't send credits to it, either, as credits are only sent *after* the message has been processed by the application. That means OOB threads on all nodes will start blocking, trying to send remote get responses to us.
This is made a worse by our staggering of remote gets. As remote get responses block, the stagger timeout kicks in and we send even more remote gets, making it even harder for the system to recover.
UFC/MFC can send a {{CREDIT_REQUES}}T message to ask for more credits. The {{REPLENISH}} messages are handled on JGroups' internal thread pool, so they are not blocked. However, the CREDIT_REQUEST can be sent at most once every {{UFC.max_block_time}} ms, so they can't be relied on to provide enough credits. With the default settings, the throughput would be {{max_credits / max_block_time == 2mb / 0.5s == 4mb/s}}, which is really small compared to regular throughput.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 6 months
[JBoss JIRA] (ISPN-6798) Examples of JDBC CacheStore configuration should remind about the shared attribute
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-6798?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero updated ISPN-6798:
----------------------------------
Description:
The "shared" configuration attribute is defaulting to _false_ even for database-based CacheStore instances as the attribute is defined in the parent of all {{CacheStore}} types.
The attribute isn't very visible either, as it's not explicitly listed among the available attributes (because it's inherited), and I think this is quite confusing as - while a non-shared configuration could be a choice - there's a strong assumption and expectation that these will point to a shared relational database.
I'd suggest to:
- add a highlighted reminder about this
- set the attribute explicitly in all example configurations
See also:
- http://stackoverflow.com/questions/37949448/duplicate-record-errors-in-hi...
was:
The "shared" configuration attribute is defaulting to _false_ even for database-based CacheStore instances as the attribute is defined in the parent of all {{CacheStore}} types.
The attribute isn't very visible either, as it's not explicitly listed among the available attributes (because it's inherited), and I think this is quite confusing as - while a non-shared configuration could be a choice - there's a strong assumption and expectation that these will point to a shared relational database.
See also:
- http://stackoverflow.com/questions/37949448/duplicate-record-errors-in-hi...
> Examples of JDBC CacheStore configuration should remind about the shared attribute
> ----------------------------------------------------------------------------------
>
> Key: ISPN-6798
> URL: https://issues.jboss.org/browse/ISPN-6798
> Project: Infinispan
> Issue Type: Enhancement
> Components: Configuration, Documentation-Core
> Reporter: Sanne Grinovero
> Priority: Minor
> Fix For: 9.0.0.Alpha4
>
>
> The "shared" configuration attribute is defaulting to _false_ even for database-based CacheStore instances as the attribute is defined in the parent of all {{CacheStore}} types.
> The attribute isn't very visible either, as it's not explicitly listed among the available attributes (because it's inherited), and I think this is quite confusing as - while a non-shared configuration could be a choice - there's a strong assumption and expectation that these will point to a shared relational database.
> I'd suggest to:
> - add a highlighted reminder about this
> - set the attribute explicitly in all example configurations
> See also:
> - http://stackoverflow.com/questions/37949448/duplicate-record-errors-in-hi...
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 6 months
[JBoss JIRA] (ISPN-6798) Examples of JDBC CacheStore configuration should remind about the shared attribute
by Sanne Grinovero (JIRA)
Sanne Grinovero created ISPN-6798:
-------------------------------------
Summary: Examples of JDBC CacheStore configuration should remind about the shared attribute
Key: ISPN-6798
URL: https://issues.jboss.org/browse/ISPN-6798
Project: Infinispan
Issue Type: Enhancement
Components: Configuration, Documentation-Core
Reporter: Sanne Grinovero
Priority: Minor
Fix For: 9.0.0.Alpha4
The "shared" configuration attribute is defaulting to _false_ even for database-based CacheStore instances as the attribute is defined in the parent of all {{CacheStore}} types.
The attribute isn't very visible either, as it's not explicitly listed among the available attributes (because it's inherited), and I think this is quite confusing as - while a non-shared configuration could be a choice - there's a strong assumption and expectation that these will point to a shared relational database.
See also:
- http://stackoverflow.com/questions/37949448/duplicate-record-errors-in-hi...
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 6 months