[infinispan-dev] Lock amortization preliminary performance numbers

Wed Feb 3 14:42:26 EST 2010

Vladimir,

So, you wind up with per-segment LRU orderings then, not global?  When you batch through, it is from the per-segment accessQueue to the per-segment LRU, right?

I have recently been bitten by non-blocking constructions which fail under high concurrency.  For example, I was using AtomicLong for a performance counter -- with disastrous results.  Under high contention, it winds up attempting to CAS without a change in its precondition, and failing and retrying again and again to get through the race.  This actually became the #1 hot spot in the code with an active thread pool.

So, I am very strongly in favor of using non-thread safe data structures under conditions where you know that you have single threaded access rather than "non-blocking" data structured.  For example, it might be cheaper to make the accessQueue a straight array or a ring buffer that was not thread safe and to use the segment lock to guard it.

Maybe the better design is to explicitly manage the threads with access to the DataContainer and use thread-locals.  That would be both non-blocking (until the updates are batches) and would avoid the possibility of high concurrency CAS spins.

I will play with things and see what hot spots turn up.

Bryan

________________________________
From: infinispan-dev-bounces at lists.jboss.org [mailto:infinispan-dev-bounces at lists.jboss.org] On Behalf Of Vladimir Blagojevic
Sent: Wednesday, February 03, 2010 10:00 AM
To: infinispan -Dev List
Cc: Martyn Cutcher; Vladimir Kondratyev
Subject: Re: [infinispan-dev] Lock amortization preliminary performance numbers

Bryan,

On 2010-02-02, at 7:49 PM, Bryan Thompson wrote:

Vladimir,

I now have a workload where I have a hotspot on our cache which is 8% of the total time.  I am going to use this to test BCHM under a realistic workload.

One thing which I like about your design is that by placing the array buffering the operations for a thread within the Segment, you are using the lock guarding the Segment to control updates to that array.  While this has less potential throughput than a truely thread-local design (which is non-blocking until the thread-local array is full), it seems that the array buffering the updates can not "escape" under your design.  Was this intentional?

Yes. Escaping could also potentially be handled by shared pool of array buffers. Ideally shared pool would be some lock free structure since  there would be a lot of contention among threads for these array buffers (that record accesses)

The danger with a true thread-local design is that a thread can come in and do some work, get some updates buffered in its thread-local array, and then never visit again.  In this case those updates would remain buffered on the thread and would not in fact cause the access order to be updated in a timely manner.  Worse, if you are relying on WeakReference semantics, the buffered updates would remain strongly reachable and the corresponding objects would be wired into the cache.

Not if you return updates into above mentioned pool as the thread unwinds from call into cache (container).

I've worked around this issue in a different context where it made sense to scope the buffers to an owning object (a B+Tree instance).  In that case, when the B+Tree container was closed, all buffered updates were discarded.  This nicely eliminated the problems with "escaping" threads.

Yes, there has to be some container boundary! In Infinispan this would be a cache instance, DataContainer to be exact.

However, I like your approach better.

Bryan

PS: I see a comment on LRU#onEntryHit() indicating that it is invoked without holding the Segment lock.  Is that true only when BCHM#get() is invoked?

Yes, I'll change javadoc to say : "is potentially invoked without holding a lock".

PPS: Also, can you expand on the role of the LRU#accessQueue vs LRU#lruQueue?  Is there anything which corresponds to the total LRU order (after batching updates through the Segment)?

LRU#accessQueue records hits on a Segment. This is the batching FIFO queue from BP-Wrapper paper except it is not per thread but rather per Segment. Notice that LRU#accessQueue is lock free but thread safe since we expect a lot of threads recording accesses per Segment. Ideal collection for this use case is ConcurrentLinkedQueue.

LRU#lruQueue is simply just a simple LRU stack implementation with one end of the queue holding most recently accessed entries and the other end of the queue with least recently accessed entries.

Regards,
Vladimir

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20100203/de8e7cdc/attachment-0002.html