[infinispan-issues] [JBoss JIRA] (ISPN-4979) CacheStatusResponse map uses too much memory
Sanne Grinovero (JIRA)
issues at jboss.org
Mon Nov 17 20:13:39 EST 2014
[ https://issues.jboss.org/browse/ISPN-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020624#comment-13020624 ]
Sanne Grinovero edited comment on ISPN-4979 at 11/17/14 8:13 PM:
-----------------------------------------------------------------
That's not exactly what I meant, sorry I've been too short as I simply meant to refer to "old discussions". We once discussed the need to not allocate new Address instance copies so often - I had diagnosed this as an allocation hot spot 2-3 years ago - and the discussion was about what kind of data structure we should use to represent a "View", and to use the View as a sort of root for most other collections we normally need for sets of addresses as these could be seen as a "mask" on top of the View they were linked to. I actually coded much of that but it was breaking too many SPIs as I was replaceing all instances of Collection<Address>, List<Address>, Set<Address>.. so the task was postponed and I think it would be a nightmare to try rebasing my old patch but I still think something in that league is eventually going to be necessary.
The basic principle is that - besides some exceptional cases - a mask of the View is good enough to represent any such Set<Address> as even when you need ordering like in List<Address>, the ordering is always going to be the same and all lists also need the property of no-repeated instanced provided by the Set. So you need a custom data structure, which we can make very efficient as our requirements are simple. Also there are plenty of points in the code in which defensive collections are created, the representation I was experimenting with would be an immutable collection, which you could mask recursively.. so with no mutations, you could guarantee that no copy would be done unless there was an actual need for it, and even then each copy would be cheap as it's again just a mask and a pointer to the inner set.
This wouldn't just be a custom collection, but you'd also have the "ViewManagementService" to be "the" only place where Address instances are maintained. If you had a de-serialization context, and the Unmarshaller had access to it, it could use it to incarnate references from the pool (as it is the pool). This would also make sure that you don't need to worry about cleanup of unused Address instances in the Unmarshaller cache, as nodes leaving would imply a View change and hence a cleanup from the root structure.
Occasionally you might still need to unmarshall an Address instance which isn't in the pool (like an unknown address such as what you might receive from a node having a different View installed); in such case you would still need to allocate one but that would be a fairly limited case.
was (Author: sannegrinovero):
That's not exactly what I meant, sorry I've been too short as I simply meant to refer to "old discussions". We once discussed the need to not allocate new Address instance copies so often - I had diagnosed this as an allocation hot spot 2-3 years ago - and the discussion was about what kind of data structure we should use to represent a "View", and to use the View as a sort of root for most other collections we normally need for sets of addresses as these could be seen as a "mask" on top of the View they were linked to. I actually coded much of that but it was breaking too many SPIs as I was replaceing all instances of Collection<Address>, List<Address>, Set<Address>.. so the task was postponed and I think it would be a nightmare to try rebasing my old patch but I still think something in that league is eventually going to be necessary.
The basic principle is that - besides some exceptional cases - a mask of the View is good enough to represent any such Set<Address> as even when you need ordering like in List<Address>, the ordering is always going to be the same and all lists also need the property of no-repeated instanced provided by the Set. So you need a custom data structure, which we can make very efficient as our requirements are simple. Also there are plenty of points in the code in which defensive collections are created, the representation I was experimenting with would be an immutable collection, which you could mask recursively.. so with no mutations, you could guarantee that no copy would be done unless there was an actual need for it, and even then each copy would be cheap as it's again just a mask and a pointer to the inner set.
This wouldn't just be a custom collection, but you'd also have the "ViewManagementService" to be "the" only place where Address instances are maintained. If you had a de-serialization context, and the Unmarshaller had access to it, it could use it to incarnate references from the pool (as it is the pool). This would also make sure that you don't need to worry about cleanup of unused Address instances in the Unmarshaller cache, as nodes leaving would imply a View change and hence a cleanup from the root structure.
> CacheStatusResponse map uses too much memory
> --------------------------------------------
>
> Key: ISPN-4979
> URL: https://issues.jboss.org/browse/ISPN-4979
> Project: Infinispan
> Issue Type: Bug
> Components: Core, State Transfer
> Affects Versions: 7.0.0.Final
> Reporter: Dan Berindei
> Assignee: William Burns
> Priority: Critical
> Fix For: 7.1.0.Final
>
>
> When the cluster is large and there are a log of caches, the {{CacheStatusResponse}} map on the new coordinator can get quite large. One of the problems that seems to be that the addresses in {{DefaultConsistentHash}} are duplicated on serialization, so the deserialized version occupies more memory.
> We need to investigate why the objects are not "shared" by the River marshaller, and maybe work around the problem by de-duplicating the addresses in the externalizer.
--
This message was sent by Atlassian JIRA
(v6.3.8#6338)
More information about the infinispan-issues
mailing list