[JBoss JIRA] (ISPN-4126) Memory is pre-allocated when enable eviction, and it may cause OOM even if there is no entry stored in cache.
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4126?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4126:
-------------------------------
Fix Version/s: 5.2.11.Final
7.2.0.Final
> Memory is pre-allocated when enable eviction, and it may cause OOM even if there is no entry stored in cache.
> -------------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4126
> URL: https://issues.jboss.org/browse/ISPN-4126
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Reporter: tina tian
> Assignee: William Burns
> Priority: Critical
> Fix For: 7.2.0.Final, 5.2.11.Final
>
>
> I am considering to use Infinispan as a single cache in my project, and I met a memory problem when I enabled eviction. Memory is pre-allocated according to the maxEntries you set even if no entry is stored in that cache!
> This is really a critical issue, since most time you are not sure how many entries you are going to store in the cache, you set a maxEntries to avoid OOME, but now, when you set maxEntries, memory will be pre-allocated no matter you are going to use it or not!
> Ehcache used to have this kind of problem too, but they solved it DEV-9437.
> Step to reproduce:
> 1) Enable eviction on config
> 2) Set maxEntries to a big number, like 100000
> 3) Create 100 caches using the default config
> You will see OOM or big usage of memory.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 11 months
[JBoss JIRA] (ISPN-6588) Query DSL like queries should only accept % and _ as the wildcards
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-6588?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-6588:
--------------------------------
Status: Pull Request Sent (was: Coding In Progress)
Git Pull Request: https://github.com/infinispan/infinispan/pull/4329
> Query DSL like queries should only accept % and _ as the wildcards
> ------------------------------------------------------------------
>
> Key: ISPN-6588
> URL: https://issues.jboss.org/browse/ISPN-6588
> Project: Infinispan
> Issue Type: Bug
> Components: Embedded Querying, Remote Querying
> Reporter: Adrian Nistor
> Assignee: Adrian Nistor
> Fix For: 9.0.0.Alpha2, 8.2.2.Final
>
>
> The official wildcards are % and _. But users can mistakenly use * and ? and those will usually work too. This works because the * and ? are valid wildcards in the underlying Lucene query language. But this breaks if the query is supposed to be run un-indexed for whatever reason.
> To fix this we need to escape * and ? from user's input so they are no longer internally handled by Lucene as wildcards. These chars should cause an exact match instead. The actual recognized wildcards should be only % and _.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 11 months
[JBoss JIRA] (ISPN-3702) Too many threads for cleaning up infinispan transactions
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3702?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-3702:
-------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Too many threads for cleaning up infinispan transactions
> --------------------------------------------------------
>
> Key: ISPN-3702
> URL: https://issues.jboss.org/browse/ISPN-3702
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Transactions
> Affects Versions: 5.3.0.Final
> Environment: Mac and Linux
> Reporter: Prasanth Pallamreddy
> Assignee: Pedro Ruivo
> Fix For: 9.0.0.Alpha2, 8.2.2.Final
>
>
> When using multiple transactional caches, we are seeing that each cache has a dedicated cleanup thread. While this is not an issue for small number of caches, when the number of caches is high as in our case (~100), we see around a 100 threads dedicated for cleanup like the following.
> "TxCleanupService,{default}_{XXX},user-mac-54275" daemon prio=5 tid=0x00007fa0f50d3800 nid=0x10f03 waiting on condition [0x00000001a5a5d000]
> "TxCleanupService,{default}_{XXX},user-mac-54275" daemon prio=5 tid=0x00007fa0f507e800 nid=0x10e03 waiting on condition [0x00000001a595a000]
> "TxCleanupService,{default}_{XXX},user-mac-54275" daemon prio=5 tid=0x00007fa0f507e000 nid=0x10d03 waiting on condition [0x00000001a5857000]
> "TxCleanupService,{default}_{XXX},user-mac-54275" daemon prio=5 tid=0x00007fa0f5817800 nid=0x10c03 waiting on condition [0x00000001a5754000]
> ...
> Looking at the source code for
> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/o...
> if (!totalOrder) {
> // Periodically run a task to cleanup the transaction table from completed transactions.
> ThreadFactory tf = new ThreadFactory() {
> @Override
> public Thread newThread(Runnable r) {
> String address = rpcManager != null ? rpcManager.getTransport().getAddress().toString() : "local";
> Thread th = new Thread(r, "TxCleanupService," + cacheName + "," + address);
> th.setDaemon(true);
> return th;
> }
> };
> executorService = Executors.newSingleThreadScheduledExecutor(tf);
> This code can benefit from drawing the threads from a dedicated pool which is bounded.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 11 months
[JBoss JIRA] (ISPN-5883) Node can apply new topology after sending status response
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5883?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-5883:
-------------------------------
Git Pull Request: https://github.com/infinispan/infinispan/pull/3782, https://github.com/infinispan/infinispan/pull/3877, https://github.com/infinispan/infinispan/pull/4321, https://github.com/infinispan/infinispan/pull/4327, https://github.com/infinispan/infinispan/pull/4328 (was: https://github.com/infinispan/infinispan/pull/3782, https://github.com/infinispan/infinispan/pull/3877, https://github.com/infinispan/infinispan/pull/4321)
> Node can apply new topology after sending status response
> ---------------------------------------------------------
>
> Key: ISPN-5883
> URL: https://issues.jboss.org/browse/ISPN-5883
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 8.0.1.Final, 7.2.5.Final, 8.1.0.Alpha2, 9.0.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 9.0.0.Alpha2, 8.1.5.Final, 8.2.2.Final
>
>
> {{LocalTopologyManagerImpl}} is responsible for sending the {{ClusterTopologyControlCommand(GET_STATUS)}} response, and when it sends the response it doesn't check the current view id against the new coordinator's view id. If the old coordinator already sent a topology update before the merge, that topology update might be processed after sending the status response. The new coordinator will send a topology update with a topology id of {{max(status response topology ids) + 1}}. The node will then process the topology update from the old coordinator, but it will ignore the topology update from the new coordinator with the same topology id.
> This is extra common in the partition handling tests, e.g. {{BasePessimisticTxPartitionAndMergeTest}} subclasses, because the test "injects" the JGroups view on each node serially, and often the 4th node sends the status response before it gets the new view.
> {noformat}
> 22:16:37,776 DEBUG (remote-thread-NodeD-p26-t6:[]) [LocalTopologyManagerImpl] Sending cluster status response for view 10
> // Topology from NodeC
> 22:16:37,778 DEBUG (transport-thread-NodeD-p28-t2:[]) [LocalTopologyManagerImpl] Updating local topology for cache pes-cache: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeC-46467, NodeD-30486]}
> // Later, topology from NodeA
> 22:16:37,827 DEBUG (transport-thread-NodeD-p28-t1:[]) [LocalTopologyManagerImpl] Ignoring late consistent hash update for cache pes-cache, current topology is 8: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeA-37631, NodeB-47846, NodeC-46467, NodeD-30486]}
> {noformat}
> As a solution, we can delay sending the status response until we have the same view as the coordinator (or a later one). We already check that the sender is the current coordinator before applying a topology update, so this will guarantee that the we don't apply other topology updates from the old coordinator. Since the status request is only sent after the new view was installed, this will not introduce any delays in the vast majority of cases.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 11 months
[JBoss JIRA] (ISPN-5883) Node can apply new topology after sending status response
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5883?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-5883:
-------------------------------
Fix Version/s: 9.0.0.Alpha2
8.1.5.Final
8.2.2.Final
(was: 8.2.0.Beta1)
(was: 8.1.4.Final)
> Node can apply new topology after sending status response
> ---------------------------------------------------------
>
> Key: ISPN-5883
> URL: https://issues.jboss.org/browse/ISPN-5883
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 8.0.1.Final, 7.2.5.Final, 8.1.0.Alpha2, 9.0.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 9.0.0.Alpha2, 8.1.5.Final, 8.2.2.Final
>
>
> {{LocalTopologyManagerImpl}} is responsible for sending the {{ClusterTopologyControlCommand(GET_STATUS)}} response, and when it sends the response it doesn't check the current view id against the new coordinator's view id. If the old coordinator already sent a topology update before the merge, that topology update might be processed after sending the status response. The new coordinator will send a topology update with a topology id of {{max(status response topology ids) + 1}}. The node will then process the topology update from the old coordinator, but it will ignore the topology update from the new coordinator with the same topology id.
> This is extra common in the partition handling tests, e.g. {{BasePessimisticTxPartitionAndMergeTest}} subclasses, because the test "injects" the JGroups view on each node serially, and often the 4th node sends the status response before it gets the new view.
> {noformat}
> 22:16:37,776 DEBUG (remote-thread-NodeD-p26-t6:[]) [LocalTopologyManagerImpl] Sending cluster status response for view 10
> // Topology from NodeC
> 22:16:37,778 DEBUG (transport-thread-NodeD-p28-t2:[]) [LocalTopologyManagerImpl] Updating local topology for cache pes-cache: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeC-46467, NodeD-30486]}
> // Later, topology from NodeA
> 22:16:37,827 DEBUG (transport-thread-NodeD-p28-t1:[]) [LocalTopologyManagerImpl] Ignoring late consistent hash update for cache pes-cache, current topology is 8: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeA-37631, NodeB-47846, NodeC-46467, NodeD-30486]}
> {noformat}
> As a solution, we can delay sending the status response until we have the same view as the coordinator (or a later one). We already check that the sender is the current coordinator before applying a topology update, so this will guarantee that the we don't apply other topology updates from the old coordinator. Since the status request is only sent after the new view was installed, this will not introduce any delays in the vast majority of cases.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 11 months
[JBoss JIRA] (ISPN-5883) Node can apply new topology after sending status response
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-5883?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-5883:
-------------------------------
Affects Version/s: 9.0.0.Alpha1
> Node can apply new topology after sending status response
> ---------------------------------------------------------
>
> Key: ISPN-5883
> URL: https://issues.jboss.org/browse/ISPN-5883
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Test Suite - Core
> Affects Versions: 8.0.1.Final, 7.2.5.Final, 8.1.0.Alpha2, 9.0.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Labels: testsuite_stability
> Fix For: 9.0.0.Alpha2, 8.1.5.Final, 8.2.2.Final
>
>
> {{LocalTopologyManagerImpl}} is responsible for sending the {{ClusterTopologyControlCommand(GET_STATUS)}} response, and when it sends the response it doesn't check the current view id against the new coordinator's view id. If the old coordinator already sent a topology update before the merge, that topology update might be processed after sending the status response. The new coordinator will send a topology update with a topology id of {{max(status response topology ids) + 1}}. The node will then process the topology update from the old coordinator, but it will ignore the topology update from the new coordinator with the same topology id.
> This is extra common in the partition handling tests, e.g. {{BasePessimisticTxPartitionAndMergeTest}} subclasses, because the test "injects" the JGroups view on each node serially, and often the 4th node sends the status response before it gets the new view.
> {noformat}
> 22:16:37,776 DEBUG (remote-thread-NodeD-p26-t6:[]) [LocalTopologyManagerImpl] Sending cluster status response for view 10
> // Topology from NodeC
> 22:16:37,778 DEBUG (transport-thread-NodeD-p28-t2:[]) [LocalTopologyManagerImpl] Updating local topology for cache pes-cache: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeC-46467, NodeD-30486]}
> // Later, topology from NodeA
> 22:16:37,827 DEBUG (transport-thread-NodeD-p28-t1:[]) [LocalTopologyManagerImpl] Ignoring late consistent hash update for cache pes-cache, current topology is 8: CacheTopology{id=8, rebalanceId=3, currentCH=DefaultConsistentHash{ns=60, owners = (4)[NodeA-37631: 15+15, NodeB-47846: 15+15, NodeC-46467: 15+15, NodeD-30486: 15+15]}, pendingCH=null, unionCH=null, actualMembers=[NodeA-37631, NodeB-47846, NodeC-46467, NodeD-30486]}
> {noformat}
> As a solution, we can delay sending the status response until we have the same view as the coordinator (or a later one). We already check that the sender is the current coordinator before applying a topology update, so this will guarantee that the we don't apply other topology updates from the old coordinator. Since the status request is only sent after the new view was installed, this will not introduce any delays in the vast majority of cases.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 11 months
[JBoss JIRA] (ISPN-6580) Hotrod performance regressions after ISPN-5342 ISPN-6545
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-6580?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes updated ISPN-6580:
------------------------------------
Status: Open (was: New)
> Hotrod performance regressions after ISPN-5342 ISPN-6545
> --------------------------------------------------------
>
> Key: ISPN-6580
> URL: https://issues.jboss.org/browse/ISPN-6580
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols, Server
> Reporter: Jakub Markos
> Assignee: William Burns
> Attachments: jfr_recordings.zip, pom.xml, Reproducer.java
>
>
> There were 2 recent regressions in hotrod performance, one between commits dd5501c5e and 628819461 and the second one between 628819461 and db0890270. I didn't look for the exact commits, so the name of the issue might not be 100% exact...
> It is easily reproducable locally with a single server instance, reproducer attached.
> The numbers on my machine:
> ||Build commit||Puts time||Gets time||
> |dd5501c5e|21|74|
> |628819461|26|102|
> |db0890270|48|224|
> The JFR recordings (attached, captured is only the part of the test with gets) for db0890270 show a lot of time is spent in HotRodDecoder#resetNow(), and also the allocation rate goes from 100MB/s for dd5501c5e to over 1GB/s for db0890270. There are no glaring differences between dd5501c5e and 628819461.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 11 months
[JBoss JIRA] (ISPN-6580) Hotrod performance regressions after ISPN-5342 ISPN-6545
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-6580?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes updated ISPN-6580:
------------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/4326
> Hotrod performance regressions after ISPN-5342 ISPN-6545
> --------------------------------------------------------
>
> Key: ISPN-6580
> URL: https://issues.jboss.org/browse/ISPN-6580
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols, Server
> Reporter: Jakub Markos
> Assignee: William Burns
> Attachments: jfr_recordings.zip, pom.xml, Reproducer.java
>
>
> There were 2 recent regressions in hotrod performance, one between commits dd5501c5e and 628819461 and the second one between 628819461 and db0890270. I didn't look for the exact commits, so the name of the issue might not be 100% exact...
> It is easily reproducable locally with a single server instance, reproducer attached.
> The numbers on my machine:
> ||Build commit||Puts time||Gets time||
> |dd5501c5e|21|74|
> |628819461|26|102|
> |db0890270|48|224|
> The JFR recordings (attached, captured is only the part of the test with gets) for db0890270 show a lot of time is spent in HotRodDecoder#resetNow(), and also the allocation rate goes from 100MB/s for dd5501c5e to over 1GB/s for db0890270. There are no glaring differences between dd5501c5e and 628819461.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 11 months