[JBoss JIRA] (ISPN-4979) CacheStatusResponse map uses too much memory
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-4979?page=com.atlassian.jira.plugin.... ]
William Burns edited comment on ISPN-4979 at 11/18/14 9:39 AM:
---------------------------------------------------------------
The implementation I am working on has an optimization when the JGroupsAddress is not composed it only does the TL lookup and nothing else. Compared to the network overhead this should be trivial amount for what can be gained in the case of multiples.
The current way would basically cut it down to N of copies where N is the number of nodes. Instead of N * M * 3 (M being the number of caches & 3 just being because we send 2 topologies - 1 has 2 member lists) The change I have currently is rather unobtrusive, however if this is not sufficient we can look into ones that change more APIs.
But I will look into the other suggestions as well though.
was (Author: william.burns):
The implementation I am working on has an optimization when the JGroupsAddress is not composed it only does the TL lookup and nothing else. Compared to the network overhead this should be trivial amount for what can be gained in the case of multiples.
But I will look into the other suggestions as well though.
> CacheStatusResponse map uses too much memory
> --------------------------------------------
>
> Key: ISPN-4979
> URL: https://issues.jboss.org/browse/ISPN-4979
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Final
> Reporter: Dan Berindei
> Assignee: William Burns
> Priority: Critical
> Fix For: 7.1.0.Final
>
>
> When the cluster is large and there are a log of caches, the {{CacheStatusResponse}} map on the new coordinator can get quite large. One of the problems that seems to be that the addresses in {{DefaultConsistentHash}} are duplicated on serialization, so the deserialized version occupies more memory.
> We need to investigate why the objects are not "shared" by the River marshaller, and maybe work around the problem by de-duplicating the addresses in the externalizer.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4979) CacheStatusResponse map uses too much memory
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-4979?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-4979:
-------------------------------------
The implementation I am working on has an optimization when the JGroupsAddress is not composed it only does the TL lookup and nothing else. Compared to the network overhead this should be trivial amount for what can be gained in the case of multiples.
But I will look into the other suggestions as well though.
> CacheStatusResponse map uses too much memory
> --------------------------------------------
>
> Key: ISPN-4979
> URL: https://issues.jboss.org/browse/ISPN-4979
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Final
> Reporter: Dan Berindei
> Assignee: William Burns
> Priority: Critical
> Fix For: 7.1.0.Final
>
>
> When the cluster is large and there are a log of caches, the {{CacheStatusResponse}} map on the new coordinator can get quite large. One of the problems that seems to be that the addresses in {{DefaultConsistentHash}} are duplicated on serialization, so the deserialized version occupies more memory.
> We need to investigate why the objects are not "shared" by the River marshaller, and maybe work around the problem by de-duplicating the addresses in the externalizer.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4939) The text is repeated three times in META-INF\services
by Ion Savin (JIRA)
[ https://issues.jboss.org/browse/ISPN-4939?page=com.atlassian.jira.plugin.... ]
Ion Savin updated ISPN-4939:
----------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 7.0.2.Final
Resolution: Done
> The text is repeated three times in META-INF\services
> -----------------------------------------------------
>
> Key: ISPN-4939
> URL: https://issues.jboss.org/browse/ISPN-4939
> Project: Infinispan
> Issue Type: Bug
> Components: Build process
> Affects Versions: 7.0.0.Final
> Reporter: ratking
> Assignee: Tristan Tarrant
> Priority: Blocker
> Fix For: 7.0.2.Final
>
>
> Download infinispan-7.0.0.Final-all.zip or infinispan-7.0.1.Final-all.zip from http://infinispan.org/download/ and unzip the file.
> {quote}
> infinispan-embedded-7.0.0.Final.jar\META-INF\beans.xml (Has been fixed in 7.0.1)
> infinispan-embedded-7.0.0.Final.jar\META-INF\services\
> infinispan-embedded-query-7.0.0.Final.jar\META-INF\services\
> infinispan-remote-7.0.0.Final.jar\META-INF\services\
> {quote}
> Open these files with a text editor, you will find that the text is *repeated three times*.
> disastrous
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4979) CacheStatusResponse map uses too much memory
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4979?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-4979:
------------------------------------
[~sannegrinovero], replacing all address collections with masks over Views was always going to be a challenge because not all members have all the views. E.g. a joiner might receive an old message referencing a view that it doesn't know about. It has become even trickier with partition handling, because we now have to track both nodes that are in the current consistent hash but not in the current JGroups view, and at the same time nodes that are in the current JGroups view but not in the JGroups view in which the CH was created.
[~william.burns], I didn't expect to reuse addresses (or anything else) across different CacheStatusResponse instances. But since we're talking about 300 nodes x 3000 caches (potentially), I think the savings from intra-CSR reuse would be big enough. In fact, my first thought was to reuse the address just inside a {{DefaultConsistentHash}}, writing the members list first and referencing the members via their indices in the other collections.
I have one small concern about your approach, as the additional {{ThreadLocal}} and {{IdentityHashMap}} accesses might slow down the more more common case of having just one address in the command. E.g. in a PrepareCommand, it's just the address of the originator in the {{GlobalTransaction}} object.
Since the number of addresses is quite limited and doesn't change from cache to cache (or even from manager to manager, with FORK), I guess we could also add a static map in {{JGroupsAddress}}, limited to {{$\{jgroups.uuid_cache.max_elements\}}} entries (similar to JGroups' UUID cache), and keeping the wire format as it is.
If necessary, we could try reusing the entire CH across CSRs. But that might be easier to do with a {{ResponseFilter}} in the {{GET_STATUS}} invocation, again keeping the wire format as it is.
> CacheStatusResponse map uses too much memory
> --------------------------------------------
>
> Key: ISPN-4979
> URL: https://issues.jboss.org/browse/ISPN-4979
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Final
> Reporter: Dan Berindei
> Assignee: William Burns
> Priority: Critical
> Fix For: 7.1.0.Final
>
>
> When the cluster is large and there are a log of caches, the {{CacheStatusResponse}} map on the new coordinator can get quite large. One of the problems that seems to be that the addresses in {{DefaultConsistentHash}} are duplicated on serialization, so the deserialized version occupies more memory.
> We need to investigate why the objects are not "shared" by the River marshaller, and maybe work around the problem by de-duplicating the addresses in the externalizer.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4841) TopologyAwareConsistentHashFactory is slow for large cluster
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4841?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-4841:
------------------------------------
SyncConsistentHashFactory seems to be much better in this regard, so ISPN-4851 should help. SyncConsistentHashFactory results could also be cached, since they only depend on the hash codes of the members.
> TopologyAwareConsistentHashFactory is slow for large cluster
> ------------------------------------------------------------
>
> Key: ISPN-4841
> URL: https://issues.jboss.org/browse/ISPN-4841
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 7.0.0.CR1
> Reporter: Takayoshi Kimura
>
> A user observed 100% CPU usage for a long time on coordinator node when booting 500 nodes with 500 caches defined.
> It looks like the TopologyAwareConsistentHashFactory performs O(n^2), it has double loop for all Machines. It takes 50 sec to compute rebalance with 1 cache 500 nodes. This calculation is performed on every cache, so it eats 25000 sec CPU times with 500 nodes 500 caches.
> The hprof shows 90% of the time is consumed in the TopologyInfo.computeMaxSegmentsForMachine().
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4841) TopologyAwareConsistentHashFactory is slow for large cluster
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4841?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4841:
-------------------------------
Status: Open (was: New)
> TopologyAwareConsistentHashFactory is slow for large cluster
> ------------------------------------------------------------
>
> Key: ISPN-4841
> URL: https://issues.jboss.org/browse/ISPN-4841
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core
> Affects Versions: 7.0.0.CR1
> Reporter: Takayoshi Kimura
>
> A user observed 100% CPU usage for a long time on coordinator node when booting 500 nodes with 500 caches defined.
> It looks like the TopologyAwareConsistentHashFactory performs O(n^2), it has double loop for all Machines. It takes 50 sec to compute rebalance with 1 cache 500 nodes. This calculation is performed on every cache, so it eats 25000 sec CPU times with 500 nodes 500 caches.
> The hprof shows 90% of the time is consumed in the TopologyInfo.computeMaxSegmentsForMachine().
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4939) The text is repeated three times in META-INF\services
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4939?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant commented on ISPN-4939:
---------------------------------------
The release script is indeed to blame since it runs install multiple times without cleaning and the maven shade plugin gets confused
> The text is repeated three times in META-INF\services
> -----------------------------------------------------
>
> Key: ISPN-4939
> URL: https://issues.jboss.org/browse/ISPN-4939
> Project: Infinispan
> Issue Type: Bug
> Components: Build process
> Affects Versions: 7.0.0.Final
> Reporter: ratking
> Assignee: Tristan Tarrant
> Priority: Blocker
>
> Download infinispan-7.0.0.Final-all.zip or infinispan-7.0.1.Final-all.zip from http://infinispan.org/download/ and unzip the file.
> {quote}
> infinispan-embedded-7.0.0.Final.jar\META-INF\beans.xml (Has been fixed in 7.0.1)
> infinispan-embedded-7.0.0.Final.jar\META-INF\services\
> infinispan-embedded-query-7.0.0.Final.jar\META-INF\services\
> infinispan-remote-7.0.0.Final.jar\META-INF\services\
> {quote}
> Open these files with a text editor, you will find that the text is *repeated three times*.
> disastrous
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months
[JBoss JIRA] (ISPN-4939) The text is repeated three times in META-INF\services
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-4939?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant commented on ISPN-4939:
---------------------------------------
Seems like a problem on the CI machine from which releases are done. Building locally does not show the issue.
> The text is repeated three times in META-INF\services
> -----------------------------------------------------
>
> Key: ISPN-4939
> URL: https://issues.jboss.org/browse/ISPN-4939
> Project: Infinispan
> Issue Type: Bug
> Components: Build process
> Affects Versions: 7.0.0.Final
> Reporter: ratking
> Assignee: Tristan Tarrant
> Priority: Blocker
>
> Download infinispan-7.0.0.Final-all.zip or infinispan-7.0.1.Final-all.zip from http://infinispan.org/download/ and unzip the file.
> {quote}
> infinispan-embedded-7.0.0.Final.jar\META-INF\beans.xml (Has been fixed in 7.0.1)
> infinispan-embedded-7.0.0.Final.jar\META-INF\services\
> infinispan-embedded-query-7.0.0.Final.jar\META-INF\services\
> infinispan-remote-7.0.0.Final.jar\META-INF\services\
> {quote}
> Open these files with a text editor, you will find that the text is *repeated three times*.
> disastrous
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
11 years, 4 months