Re: [infinispan-dev] Lock amortization preliminary performance numbers

Wednesday, 27 January 2010

Vladimir,

I recently replaced a synchronized ring buffer behind a ConcurrentHashMap with weak values
with a variation of the ring buffer which buffers updates in thread-local arrays and the
batches them through the lock.  For the specific application (a cache of frequently
resolved terms in a dictionary) contention went to zero.  We use a very similar design
with a ring buffer to hold onto recently touched B+Tree nodes and leaves.  This was a hot
spot for concurrent readers as they needed to synchronize when adding the reference to the
ring buffer.  I made a similar change, to use a variant of the ring buffer with
thread-local buffers and batching through the lock, and that contention has also
disappeared completely.  I have been using between a 64 and 128 element per-thread array
with barging into the lock at 50% of the buffer capacity.

So, overall, this approach is working out very nicely for us in combination with
WeakReferences.  One drawback of the approach is that hard references remain in the
thread-local arrays until they are batched through the lock.  Depending on how threads are
managed, it is possible for hard references to remain in the thread local queues long past
their last touch.  For this reason we do not use this technique when the cache contains
expensive resources where we need a better guarantee of timely discard when they resource
is no longer in use by the application.  It might be possible to address this concern with
a protocol to attach and detach the thread-local buffers from threads on their way into /
out of a thread pool.

I have not yet tried to use this approach in our main data record cache (caching data
records read from the disk in memory).  We have a hot spot there still, but I am quite
certain that we will be able to address it using similar techniques.  I will try out the
infinispan BufferedConcurrentHashMap in that context soon.

Thanks,

Bryan

________________________________
From: infinispan-dev-bounces(a)lists.jboss.org
[mailto:infinispan-dev-bounces@lists.jboss.org] On Behalf Of Vladimir Blagojevic
Sent: Wednesday, January 27, 2010 10:30 AM
To: infinispan -Dev List
Subject: [infinispan-dev] Lock amortization preliminary performance numbers

Hi all,

As you probably recall I am working on a LIRS eviction algorithm. But before we get there,
Manik and I agreed that I push in a direction of implementing ConcurrentHashMap(CHM)
variant that will do lock amortization and eviction per segment of CHM. So far I have
implemented LRU eviction so we can verify feasibility of this approach rather than
eviction algorithm itself. We are hoping that eviction algorithm precision will not suffer
if we do eviction per segment and you all are familiar with lock striping benefits of
segments in CHM.

So, I've cobbled up a first test to compare eviction and lock amortized enabled CHM
(BCHM) with regular CHM and synchronized HashMap. The test is actually based on already
existing test [1] used to measure performance of various DataContainers. I've changed
it slightly to measure Map directly instead of DataContainer. The test launches in cohort
48 reader, 4 writer and 1 remover threads. All operations are randomized; readers execute
map.get(R.nextInt(NUM_KEYS), writers map.put(R.nextInt(NUM_KEYS), "value"), and
finally removers execute map.remove(R.nextInt(NUM_KEYS). NUM_KEYS was set to 10K. Each
thread does in a loop 1K ops.

Initial capacity for CHM and HashMap were set to 1K, max capacity for eviction and lock
amortized enabled CHM was set to 256; therefore BCHM has to do a lot of evictions which is
evident in map final size listed below.

Size = 9999
Performance for container ConcurrentHashMap
Average get ops/ms 338
Average put ops/ms 87
Average remove ops/ms 171

Size = 193
Performance for container BufferedConcurrentHashMap
Average get ops/ms 322
Average put ops/ms 32
Average remove ops/ms 74

Size = 8340
Performance for container SynchronizedMap
Average get ops/ms 67
Average put ops/ms 45
Average remove ops/ms 63

If I remove lock amortization for BufferedConcurrentHashMap, that is if we attempt
eviction on every get/put/remove, performance of put/remove for BCHM goes to zero
basically! As far as I can interpret these results, BCHM get operation performance does
not suffer at all in comparison with CHM as it does for single lock HashMap. Predictably,
for single lock HashMap each type of operation takes on avg almost the same amount of
time. We pay a hit in BCHM put/remove operations in comparison with CHM but the numbers
are promising, I think. If  you have comments, or even suggestions on how to test these
map variants in a different, possibly more insightful approach, speak up!

Cheers,
Vladimir

[1]
http://fisheye.jboss.org/browse/~raw,r=1264/Infinispan/trunk/core/src/tes...

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Lock amortization preliminary performance numbers