[infinispan-dev] Initial LIRS implementation

Tue Feb 9 07:24:57 EST 2010

FYI, those results were with 4 concurrent clients issuing query mixes.  Throughput is higher for all conditions of that matrix with 8 concurrent clients. Bryan
________________________________________
From: Bryan Thompson
Sent: Monday, February 08, 2010 7:59 PM
To: Bryan Thompson; Vladimir Blagojevic
Cc: infinispan -Dev List; Martyn Cutcher
Subject: RE: [infinispan-dev] Initial LIRS implementation

Vladimir,

Here are some throughput numbers for our application on a query benchmark (The Berlin Sparql Benchmark using the "reduced query mix") comparing the performance with our existing cache (known hotspot) and the BCHM cache from infinispan.  The metric is Query Mixes per Hour.

Base: Cold Cache=1851; Hot Cache=18,537
LIRS: Cold Cache=2317; Hot Cache=28,709

You can see that there is a significant effect for both the cold and hot cache conditions when compared to the baseline cache implementation, which is great.  I have also validated correctness for the benchmark using the BHCM LIRS cache.  I have not yet resolved the deadlock when using the BHCM LRU cache (I have not tried your most current update of that class).  I have not put this under a profiler yet either.  These are use some early results.

Bryan
________________________________________
From: Bryan Thompson
Sent: Monday, February 08, 2010 6:39 AM
To: Bryan Thompson; Vladimir Blagojevic
Cc: infinispan -Dev List
Subject: RE: [infinispan-dev] Initial LIRS implementation

Vladimir,

Here is an expanded concept for the cache implementation.

I am thinking that the right approach is a class which encapsulates:

(a) an unmodified ConcurrentHashMap (CHM); combined with

(b) non-thread-safe thread-local buffers (TLB) for touches, managed by
    an inner CHM<ThreadId,TLB> instance.  The reason for this inner
    map is to allow the TLB instances to be discarded by clear(); The
    TLBs get batched onto;

(c) a shared non-thread safe access policy (LIRS, LRU) built on
    double-linked nodes (DLN) stored in the inner CHM.  Updates are
    deferred until holding the lock (d).  The DLN reference to the
    cached value is final.  The (prior, next, delete) fields are only
    read or written while holding the lock (d).  Other fields could be
    defined by subclassing a newDLN() method to support LIRS, etc. The
    access policy will need [head, tail] or similar fields, which
    would also be guarded by the lock (d);

(d) a single lock guarding mutation on the access policy.  Since there
    is only one lock, there can be no lock ordering problems.  Both
    batching touches onto (c) and eviction (per the access policy)
    require access to the lock, but that is the only lock.  If the
    access policy batches evictions, then lock requests will be rare
    and the whole cache will be non-blocking, wait free, and not
    spinning on CAS locks 99% of the time; and

(d) explicit management of the threads used to access the cache. e.g.,
    by queuing accepted requests and servicing them out of a thread
    pool, which has the benefit of managing the workload imposed by
    the clients.

This should have the best possible performance and the simplest
implementation.  (b) The TLB could be a DLN[] or other simple data
structures. The access policy (c) is composed from linking DLN
instances together while holding the lock.

A get() on the outer class looks up the DLN on the inner CHM and
places it into the TLB (if found).

A put() or putIfAbsent() on the outer class creates a new DLN and
either unconditionally or conditionally puts it into the inner CHM.
The new DLN is added to the TLB IFF it was added to the inner CHM.
The access order is NOT updated at this time.

A remove() on the outer class acquires the lock (d), looks up the DLN
in the cache, and synchronously unlinks the DLN if found and sets its
[deleted] flag.  I would recommend that the clients do not call
remove() directly, or that an outer remove() method exists which only
removes the DLN from the inner CHM and queues up remove requests to be
processed the next time any thread batches its touches through the
lock.  The inner remove() method would synchronously update the DLNs.

When batching touches through the lock, only the access order is
updated by the appropriate updates of the DLN nodes.  If the [deleted]
flag is set, then the DLN has been removed from the cache and its
access order is NOT updated.  If the cache is over its defined
maximums, then evictions are batched while holding the lock.
Evictions are only processed when batching touches through the lcok.

A clear() clear the ConcurrentHashMap<Key,DLN<Val>> map. It would also
clear the inner ConcurrentHashMap<ThreadId,TLB> map, which would cause
the existing TLB instances to be discarded.  It would have to obtain
the lock in order to clear the [head,tail] or related fields for the
access policy.

Bryan

________________________________________
From: Bryan Thompson
Sent: Sunday, February 07, 2010 8:19 PM
To: Vladimir Blagojevic
Cc: infinispan -Dev List
Subject: RE: [infinispan-dev] Initial LIRS implementation

Vladimir,

I am thinking that the right approach is a class which encapsulates:

(a) an unmodified ConcurrentHashMap;  combined with

(b) per-thread non-thread-safe buffers for touches; which get batched onto

(c) a shared non-thread safe access policy (LIRS, LRU); and

(d) explicit management of the threads used to access the cache. e.g.,
    by queuing accepted requests and servicing them out of a thread
    pool.

This should have the best possible performance and the simplest
implementation.  (b) can be use an Object[].  (c) can be composed from
simple arrays or other non-thread safe data structures.  Both batching
touches onto (c) and eviction (per the policy) require access to a
lock for (c), but that is the only lock.  If the policy batches
evictions, then lock requests will be rare and the whole cache will be
non-blocking, wait free, and not spinning on CAS locks 99% of the
time.

Bryan

________________________________________
From: Vladimir Blagojevic [vblagoje at redhat.com]
Sent: Sunday, February 07, 2010 7:55 AM
To: Bryan Thompson
Cc: infinispan -Dev List
Subject: Re: [infinispan-dev] Initial LIRS implementation

Hey Bryan,

Does it happen with LIRS as well? Pick up the latest version from svn so we know which file to look at. I've never seen this kind of exception in compareAndSwap, anyone?

Vladimir

On 2010-02-06, at 7:53 PM, Bryan Thompson wrote:

> Vladimir,
>
> I believe that there is a lock problem with the BufferedConcurrentHashMap.  This code path does not progress.  What is strange is that this is the only thread in the thread dump that is in BufferedConcurrentHashMap, which suggests a lock ordering problem.  I don't see the problem offhand, but this happens everytime I run our application with the BufferedConcurrentHashMap.
>
> Bryan
>
> "Thread-3" prio=10 tid=0x00002aafec141000 nid=0x41ba runnable [0x0000000040737000]
>   java.lang.Thread.State: RUNNABLE
>        at sun.misc.Unsafe.$$YJP$$compareAndSwapInt(Native Method)
>        at sun.misc.Unsafe.compareAndSwapInt(Unsafe.java)
>        at java.util.concurrent.locks.AbstractQueuedSynchronizer.compareAndSetState(AbstractQueuedSynchronizer.java:526)
>        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:183)
>        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>        at org.infinispan.util.concurrent.BufferedConcurrentHashMap$Segment.remove(BufferedConcurrentHashMap.java:745)
>        at org.infinispan.util.concurrent.BufferedConcurrentHashMap$Segment$LRU.execute(BufferedConcurrentHashMap.java:832)
>        at org.infinispan.util.concurrent.BufferedConcurrentHashMap$Segment.put(BufferedConcurrentHashMap.java:659)
>        at org.infinispan.util.concurrent.BufferedConcurrentHashMap.putIfAbsent(BufferedConcurrentHashMap.java:1421)
>        at com.bigdata.cache.BCHMGlobalLRU$InnerCacheImpl.putIfAbsent(BCHMGlobalLRU.java:633)
>        at com.bigdata.cache.BCHMGlobalLRU$InnerCacheImpl.putIfAbsent(BCHMGlobalLRU.java:557)
>        at com.bigdata.btree.AbstractBTree.readNodeOrLeaf(AbstractBTree.java:3624)
>
> ________________________________________
> From: Bryan Thompson
> Sent: Wednesday, February 03, 2010 5:04 PM
> To: Vladimir Blagojevic
> Cc: infinispan -Dev List
> Subject: RE: [infinispan-dev] Initial LIRS implementation
>
> Vladimir,
>
> Does this match up with what you are seeing?  That is a good win for get()/remove() with a loss on put().
>
> Size = 14698
> Performance for container ConcurrentHashMap
> Average get ops/ms 13931
> Average put ops/ms 306
> Average remove ops/ms 294
>
> Size = 452
> Performance for container BufferedConcurrentHashMap(LRU)
> Average get ops/ms 7735
> Average put ops/ms 140
> Average remove ops/ms 200
>
> Size = 494
> Performance for container BufferedConcurrentHashMap(LIRS)
> Average get ops/ms 13400
> Average put ops/ms 36
> Average remove ops/ms 1136
>
> Bryan
>
>> -----Original Message-----
>> From: Vladimir Blagojevic [mailto:vblagoje at redhat.com]
>> Sent: Wednesday, February 03, 2010 3:36 PM
>> To: Bryan Thompson
>> Subject: Re: [infinispan-dev] Initial LIRS implementation
>>
>> Yes,
>>
>> Have a look at LRU first and leave LIRS for tomorrow :) I'd
>> be happiest if you can find a use case for BCHM+LRU in your
>> application and test it out that way. Real life scenario!
>>
>> Cheers
>>
>> On 2010-02-03, at 3:27 PM, Bryan Thompson wrote:
>>
>>> Ah. It is inside the same outer class? Bryan
>>>
>>>
>>>> -----Original Message-----
>>>> From: Vladimir Blagojevic [mailto:vblagoje at redhat.com]
>>>> Sent: Wednesday, February 03, 2010 2:44 PM
>>>> To: Bryan Thompson
>>>> Subject: Re: [infinispan-dev] Initial LIRS implementation
>>>>
>>>> Here it is:
>>>> http://fisheye.jboss.org/browse/Infinispan/trunk/core/src/main
>>>> /java/org/infinispan/util/concurrent/BufferedConcurrentHashMap.java
>>>>
>>>> Are you familiar with LIRS? If not, don't bother unless you are
>>>> willing to dedicate at least a day or two :(
>>>>
>>>> http://www.ece.eng.wayne.edu/~sjiang/Projects/LIRS/sig02.ppt
>>>> http://citeseer.ist.psu.edu/527790.html
>>>>
>>>> Regards,
>>>> Vladimir
>>>>
>>>>
>>>> On 2010-02-03, at 2:31 PM, Bryan Thompson wrote:
>>>>
>>>>> Can you send me the full class name and I will check it out. Bryan
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: infinispan-dev-bounces at lists.jboss.org
>>>>>> [mailto:infinispan-dev-bounces at lists.jboss.org] On Behalf
>>>> Of Vladimir
>>>>>> Blagojevic
>>>>>> Sent: Wednesday, February 03, 2010 2:09 PM
>>>>>> To: infinispan -Dev List
>>>>>> Subject: [infinispan-dev] Initial LIRS implementation
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've just committed preliminary attempt to implement LIRS.
>>>>>> There does not seem to be serious degradation in terms of
>>>> performance
>>>>>> when it comes to get/remove commands in comparison with
>>>> LRU enabled
>>>>>> BufferedConcurrentHashMap.
>>>>>> However, put command is about three times slower than in
>>>> LRU; put in
>>>>>> LIRS is as fast as put command of a single lock
>>>> synchronized HashMap.
>>>>>>
>>>>>> Looking for enthusiasts willing to help out with some code
>>>> review :)
>>>>>>
>>>>>> Regards,
>>>>>> Vladimir
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev at lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>>
>>>>
>>>>
>>
>>