Vladimir,
I recently replaced a synchronized
ring buffer behind a ConcurrentHashMap with weak values with a variation of
the ring buffer which buffers updates in thread-local arrays and the batches
them through the lock. For the specific application (a cache of
frequently resolved terms in a dictionary) contention went to zero. We
use a very similar design with a ring buffer to hold onto recently touched
B+Tree nodes and leaves. This was a hot spot for concurrent readers as
they needed to synchronize when adding the reference to the ring buffer.
I made a similar change, to use a variant of the ring buffer with thread-local
buffers and batching through the lock, and that contention has also
disappeared completely. I have been using between a 64 and 128 element
per-thread array with barging into the lock at 50% of the buffer
capacity.
So, overall, this approach is working
out very nicely for us in combination with WeakReferences.
One drawback of the approach is that hard references
remain in the thread-local arrays until they are batched through the
lock. Depending on how threads are managed, it is possible for hard
references to remain in the thread local queues long past their last
touch. For this reason we do not use this technique when the cache
contains expensive resources where we need a better guarantee of timely
discard when they resource is no longer in use by the application. It
might be possible to address this concern with a protocol to attach and detach
the thread-local buffers from threads on their way into / out of a thread
pool.
I have not yet tried to use this
approach in our main data record cache (caching data records read from the
disk in memory). We have a hot spot there still, but I am quite certain
that we will be able to address it using similar techniques. I will
try out the infinispan BufferedConcurrentHashMap in that context
soon.
Thanks,
Bryan
Hi all,
As you probably recall I am
working on a LIRS eviction algorithm. But before we get there, Manik and I
agreed that I push in a direction of implementing ConcurrentHashMap(CHM)
variant that will do lock amortization and eviction per segment of CHM. So
far I have implemented LRU eviction so we can verify feasibility of this
approach rather than eviction algorithm itself. We are hoping that eviction
algorithm precision will not suffer if we do eviction per segment and you
all are familiar with lock striping benefits of segments in
CHM.
So, I've cobbled up a first
test to compare eviction and lock amortized enabled CHM (BCHM) with regular
CHM and synchronized HashMap. The test is actually based on already existing
test [1] used to measure performance of various DataContainers. I've changed
it slightly to measure Map directly instead of DataContainer. The test
launches in cohort 48 reader, 4 writer and 1 remover threads. All operations
are randomized; readers execute map.get(R.nextInt(NUM_KEYS),
writers map.put(R.nextInt(NUM_KEYS), "value"), and finally removers
execute map.remove(R.nextInt(NUM_KEYS). NUM_KEYS was set to 10K. Each
thread does in a loop 1K ops.
Initial capacity
for CHM and HashMap were set to 1K, max capacity for eviction
and lock amortized enabled CHM was set to 256; therefore BCHM has to do a
lot of evictions which is evident in map final size listed
below.
Size = 9999
Performance for
container ConcurrentHashMap
Average get ops/ms
338
Average put ops/ms
87
Average remove ops/ms
171
Size = 193
Performance for
container BufferedConcurrentHashMap
Average get ops/ms
322
Average put ops/ms
32
Average remove ops/ms
74
Size = 8340
Performance for
container SynchronizedMap
Average get ops/ms
67
Average put ops/ms
45
Average remove ops/ms
63
If I remove lock amortization for BufferedConcurrentHashMap,
that is if we attempt eviction on every get/put/remove, performance of
put/remove for BCHM goes to zero basically! As far as I can
interpret these results, BCHM get operation performance does not suffer at
all in comparison with CHM as it does for single lock HashMap. Predictably,
for single lock HashMap each type of operation takes on avg almost
the same amount of time. We pay a hit in BCHM put/remove operations in
comparison with CHM but the numbers are promising, I think. If you
have comments, or even suggestions on how to test these map variants in a
different, possibly more insightful approach, speak
up!
Cheers,
Vladimir