Re: [infinispan-dev] Lock amortization preliminary performance numbers

Wednesday, 3 February 2010

Vladimir,

So, you wind up with per-segment LRU orderings then, not global?  When you batch through,
it is from the per-segment accessQueue to the per-segment LRU, right?

I have recently been bitten by non-blocking constructions which fail under high
concurrency.  For example, I was using AtomicLong for a performance counter -- with
disastrous results.  Under high contention, it winds up attempting to CAS without a change
in its precondition, and failing and retrying again and again to get through the race. 
This actually became the #1 hot spot in the code with an active thread pool.

So, I am very strongly in favor of using non-thread safe data structures under conditions
where you know that you have single threaded access rather than "non-blocking"
data structured.  For example, it might be cheaper to make the accessQueue a straight
array or a ring buffer that was not thread safe and to use the segment lock to guard it.

Maybe the better design is to explicitly manage the threads with access to the
DataContainer and use thread-locals.  That would be both non-blocking (until the updates
are batches) and would avoid the possibility of high concurrency CAS spins.

I will play with things and see what hot spots turn up.

Bryan

________________________________
From: infinispan-dev-bounces(a)lists.jboss.org
[mailto:infinispan-dev-bounces@lists.jboss.org] On Behalf Of Vladimir Blagojevic
Sent: Wednesday, February 03, 2010 10:00 AM
To: infinispan -Dev List
Cc: Martyn Cutcher; Vladimir Kondratyev
Subject: Re: [infinispan-dev] Lock amortization preliminary performance numbers

Bryan,

On 2010-02-02, at 7:49 PM, Bryan Thompson wrote:

Vladimir,

I now have a workload where I have a hotspot on our cache which is 8% of the total time. 
I am going to use this to test BCHM under a realistic workload.

One thing which I like about your design is that by placing the array buffering the
operations for a thread within the Segment, you are using the lock guarding the Segment to
control updates to that array.  While this has less potential throughput than a truely
thread-local design (which is non-blocking until the thread-local array is full), it seems
that the array buffering the updates can not "escape" under your design.  Was
this intentional?

Yes. Escaping could also potentially be handled by shared pool of array buffers. Ideally
shared pool would be some lock free structure since  there would be a lot of contention
among threads for these array buffers (that record accesses)

The danger with a true thread-local design is that a thread can come in and do some work,
get some updates buffered in its thread-local array, and then never visit again.  In this
case those updates would remain buffered on the thread and would not in fact cause the
access order to be updated in a timely manner.  Worse, if you are relying on WeakReference
semantics, the buffered updates would remain strongly reachable and the corresponding
objects would be wired into the cache.

Not if you return updates into above mentioned pool as the thread unwinds from call into
cache (container).

I've worked around this issue in a different context where it made sense to scope the
buffers to an owning object (a B+Tree instance).  In that case, when the B+Tree container
was closed, all buffered updates were discarded.  This nicely eliminated the problems with
"escaping" threads.

Yes, there has to be some container boundary! In Infinispan this would be a cache
instance, DataContainer to be exact.

However, I like your approach better.

Bryan

PS: I see a comment on LRU#onEntryHit() indicating that it is invoked without holding the
Segment lock.  Is that true only when BCHM#get() is invoked?

Yes, I'll change javadoc to say : "is potentially invoked without holding a
lock".

PPS: Also, can you expand on the role of the LRU#accessQueue vs LRU#lruQueue?  Is there
anything which corresponds to the total LRU order (after batching updates through the
Segment)?

LRU#accessQueue records hits on a Segment. This is the batching FIFO queue from BP-Wrapper
paper except it is not per thread but rather per Segment. Notice that LRU#accessQueue is
lock free but thread safe since we expect a lot of threads recording accesses per Segment.
Ideal collection for this use case is ConcurrentLinkedQueue.

LRU#lruQueue is simply just a simple LRU stack implementation with one end of the queue
holding most recently accessed entries and the other end of the queue with least recently
accessed entries.

Regards,
Vladimir

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Lock amortization preliminary performance numbers