[jbosscache-dev] Optimising node storage in memory

Sat Feb 3 06:08:11 EST 2007

On 2 Feb 2007, at 18:22, Elias Ross wrote:

>
> --- Manik Surtani <manik at jboss.org> wrote:
>
>> Guys,
>>
>> Playing around with looking at how efficient (or not) it is to store
>> nodes as children of other nodes.  While this makes logical sense, on
>> an implementation level walking the tree sucks - O(n) where n is the
>> depth of the tree.  This works great for small trees, but as depth
>> increases, so does the lookup time.
>
> I spent some time looking at storage issues for maintaining carrier
> information on phone numbers.  We could only identify the possible  
> use of
> depth 3 (country code, area code, number).
>
> Even classifying every species on the planet only really requires  
> eight
> levels.
>
> The reason you don't need too many depths (usually) is that with 10  
> children
> at each level, the number of nodes you would create with a depth of  
> 10 is
> 10^10, or 1,000,000,000 leaf nodes, with about half as many  
> intermediate
> nodes.

That is only provided you have as many as 10 children at each level.   
And even then, given the application at hand, 1.5 Bn nodes may not be  
unfeasible (I'm talking about grid applications for financial  
institutions performing risk analyses on stock market data).   
Hypothetical for now, I know, but if this does not impact the basic  
cases of 3 nodes deep adversely, why not build in the capacity to  
deal with more complex and larger structures?

>
> And if you kept a map of FQN -> NODE, you still need to maintain some
> structure to enumerate all the children.

Yes, Maps are still maintained per Node, containing child nodes.  I'm  
not suggesting we do away with this.  Just add a more efficient way  
of retrieving a Node at an arbitrary depth, without involving  
traversing the tree.  In terms of memory, this may increase the  
footprint somewhat (an extra Map maintaining these references).

>
> One thing that would be nice (and falls under the subject of the  
> thread)
> would be to optimize the size of common cache objects.  Being able  
> to cache
> more is very important.
>
> Check out the SizeOf program:
>
> class java.util.concurrent.ConcurrentHashMap: 1271
> class java.util.HashMap: 119
> class org.jboss.cache.UnversionedNode: 159 (no children)
> class org.jboss.cache.UnversionedNode: 1682 (when children() called)
> class org.jboss.cache.UnversionedNode: 39 (no "new HashMap()")
>
> ConcurrentHashMap is faster, but uses about 10 times more memory...

I agree.  And in quite a few cases may be unnecessary if we assume  
proper and robust locking at higher levels.  This does need review.

>
>
>
> ______________________________________________________________________ 
> ______________
> Want to start your own business?
> Learn how on Yahoo! Small Business.
> http://smallbusiness.yahoo.com/r-index