[infinispan-dev] DataContainer performance review

Mon Jul 4 09:26:48 EDT 2011

Hi Vladimir

On Thu, Jun 30, 2011 at 5:31 PM, Vladimir Blagojevic
<vblagoje at redhat.com> wrote:
> On 11-06-30 12:23 PM, Mircea Markus wrote:
>>> Yes, only 32 segments! Much of that performance impact in BCHM comes
>>> from node tracking overhead per segment (queues, lists etc etc) as
>>> well. When we increase segment count this overhead falls
>>> substantially. CHM performs well with both 32 and 512 segments under
>>> these tests. It is just that BCHM basically grinds to a stop for
>>> large caches if we have 32 segments only.
>> Can't we adjust the segment size dynamically?
> I don't think we can ignore user settings for concurrency and max
> capacity. Concurrency essentially equals to segment count. We can warn
> user that if concurrency is inappropriate for container max capacity.

I don't think it is a good idea to tie the concurrency level to the
container capacity. ConcurrentHashMap defines the concurrency level as
"the allowed concurrency among update operations", we should try to
stick to that definition.

For comparison, I updated your MapStressTest to use a synchronized
java.util.LinkedHashMap with LRU eviction and also a (very incomplete)
ConcurrentHashMap clone using synchronized LinkedHashMaps as segments.
LinkedHashMap uses a neat trick: the hashtable entries are the same as
the LRU queue entries, so moving an element to the head of the queue
is an O(1) operation. This makes its performance much closer to CHM
than to our LRU implementation.

I don't think we can use this trick with our BoundedConcurrentHashMap
as it is, because we try to keep the map and the eviction policies
completely separate and that forces the LRU eviction to look up the
key in a LinkedList on average once for every get(), which is a O(n)
operation. I tried to use a BidirectionalLinkedHashMap instead of a
LinkedList and I failed, but it would never be as fast as the
LinkedHashMap implementation anyway.

I'm not sure if the LinkedHashMap trick can be applied for the LIRS
algorithm as well, since it uses both a stack and a queue (at least
that's what their names say). But it also tends to look up keys in the
queue, which is a LinkedList, so I definitely think we should try it.

The big advantage I see in the LinkedHashMap trick is with faster gets
we can eliminate the batching logic from the eviction algorithms,
which would offset the necessity of reimplementing the Segment
implementation completely for each eviction policy.

I also made some changes to the test itself to make it more "realistic":
1. I added a warm-up period
2. I used Gaussian distribution for the keys
3. I made the keys Strings to make the equals() calls more expensive
4. I made each thread start from another index in the keys sequence,
so they don't do exactly the same sequence of operations
5. At the end of the test I also print the standard deviation of the
keys in the map, which should broadly show how effective the eviction
policy is in keeping the most accessed keys in the map.

I've created a pull request at
https://github.com/infinispan/infinispan/pull/414 , please have a look
and tell me what you think.

Dan