FYI, those results were with 4 concurrent clients issuing query mixes. Throughput is
higher for all conditions of that matrix with 8 concurrent clients. Bryan
________________________________________
From: Bryan Thompson
Sent: Monday, February 08, 2010 7:59 PM
To: Bryan Thompson; Vladimir Blagojevic
Cc: infinispan -Dev List; Martyn Cutcher
Subject: RE: [infinispan-dev] Initial LIRS implementation
Vladimir,
Here are some throughput numbers for our application on a query benchmark (The Berlin
Sparql Benchmark using the "reduced query mix") comparing the performance with
our existing cache (known hotspot) and the BCHM cache from infinispan. The metric is
Query Mixes per Hour.
Base: Cold Cache=1851; Hot Cache=18,537
LIRS: Cold Cache=2317; Hot Cache=28,709
You can see that there is a significant effect for both the cold and hot cache conditions
when compared to the baseline cache implementation, which is great. I have also validated
correctness for the benchmark using the BHCM LIRS cache. I have not yet resolved the
deadlock when using the BHCM LRU cache (I have not tried your most current update of that
class). I have not put this under a profiler yet either. These are use some early
results.
Bryan
________________________________________
From: Bryan Thompson
Sent: Monday, February 08, 2010 6:39 AM
To: Bryan Thompson; Vladimir Blagojevic
Cc: infinispan -Dev List
Subject: RE: [infinispan-dev] Initial LIRS implementation
Vladimir,
Here is an expanded concept for the cache implementation.
I am thinking that the right approach is a class which encapsulates:
(a) an unmodified ConcurrentHashMap (CHM); combined with
(b) non-thread-safe thread-local buffers (TLB) for touches, managed by
an inner CHM<ThreadId,TLB> instance. The reason for this inner
map is to allow the TLB instances to be discarded by clear(); The
TLBs get batched onto;
(c) a shared non-thread safe access policy (LIRS, LRU) built on
double-linked nodes (DLN) stored in the inner CHM. Updates are
deferred until holding the lock (d). The DLN reference to the
cached value is final. The (prior, next, delete) fields are only
read or written while holding the lock (d). Other fields could be
defined by subclassing a newDLN() method to support LIRS, etc. The
access policy will need [head, tail] or similar fields, which
would also be guarded by the lock (d);
(d) a single lock guarding mutation on the access policy. Since there
is only one lock, there can be no lock ordering problems. Both
batching touches onto (c) and eviction (per the access policy)
require access to the lock, but that is the only lock. If the
access policy batches evictions, then lock requests will be rare
and the whole cache will be non-blocking, wait free, and not
spinning on CAS locks 99% of the time; and
(d) explicit management of the threads used to access the cache. e.g.,
by queuing accepted requests and servicing them out of a thread
pool, which has the benefit of managing the workload imposed by
the clients.
This should have the best possible performance and the simplest
implementation. (b) The TLB could be a DLN[] or other simple data
structures. The access policy (c) is composed from linking DLN
instances together while holding the lock.
A get() on the outer class looks up the DLN on the inner CHM and
places it into the TLB (if found).
A put() or putIfAbsent() on the outer class creates a new DLN and
either unconditionally or conditionally puts it into the inner CHM.
The new DLN is added to the TLB IFF it was added to the inner CHM.
The access order is NOT updated at this time.
A remove() on the outer class acquires the lock (d), looks up the DLN
in the cache, and synchronously unlinks the DLN if found and sets its
[deleted] flag. I would recommend that the clients do not call
remove() directly, or that an outer remove() method exists which only
removes the DLN from the inner CHM and queues up remove requests to be
processed the next time any thread batches its touches through the
lock. The inner remove() method would synchronously update the DLNs.
When batching touches through the lock, only the access order is
updated by the appropriate updates of the DLN nodes. If the [deleted]
flag is set, then the DLN has been removed from the cache and its
access order is NOT updated. If the cache is over its defined
maximums, then evictions are batched while holding the lock.
Evictions are only processed when batching touches through the lcok.
A clear() clear the ConcurrentHashMap<Key,DLN<Val>> map. It would also
clear the inner ConcurrentHashMap<ThreadId,TLB> map, which would cause
the existing TLB instances to be discarded. It would have to obtain
the lock in order to clear the [head,tail] or related fields for the
access policy.
Bryan
________________________________________
From: Bryan Thompson
Sent: Sunday, February 07, 2010 8:19 PM
To: Vladimir Blagojevic
Cc: infinispan -Dev List
Subject: RE: [infinispan-dev] Initial LIRS implementation
Vladimir,
I am thinking that the right approach is a class which encapsulates:
(a) an unmodified ConcurrentHashMap; combined with
(b) per-thread non-thread-safe buffers for touches; which get batched onto
(c) a shared non-thread safe access policy (LIRS, LRU); and
(d) explicit management of the threads used to access the cache. e.g.,
by queuing accepted requests and servicing them out of a thread
pool.
This should have the best possible performance and the simplest
implementation. (b) can be use an Object[]. (c) can be composed from
simple arrays or other non-thread safe data structures. Both batching
touches onto (c) and eviction (per the policy) require access to a
lock for (c), but that is the only lock. If the policy batches
evictions, then lock requests will be rare and the whole cache will be
non-blocking, wait free, and not spinning on CAS locks 99% of the
time.
Bryan
________________________________________
From: Vladimir Blagojevic [vblagoje(a)redhat.com]
Sent: Sunday, February 07, 2010 7:55 AM
To: Bryan Thompson
Cc: infinispan -Dev List
Subject: Re: [infinispan-dev] Initial LIRS implementation
Hey Bryan,
Does it happen with LIRS as well? Pick up the latest version from svn so we know which
file to look at. I've never seen this kind of exception in compareAndSwap, anyone?
Vladimir
On 2010-02-06, at 7:53 PM, Bryan Thompson wrote:
Vladimir,
I believe that there is a lock problem with the BufferedConcurrentHashMap. This code
path does not progress. What is strange is that this is the only thread in the thread
dump that is in BufferedConcurrentHashMap, which suggests a lock ordering problem. I
don't see the problem offhand, but this happens everytime I run our application with
the BufferedConcurrentHashMap.
Bryan
"Thread-3" prio=10 tid=0x00002aafec141000 nid=0x41ba runnable
[0x0000000040737000]
java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.$$YJP$$compareAndSwapInt(Native Method)
at sun.misc.Unsafe.compareAndSwapInt(Unsafe.java)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.compareAndSetState(AbstractQueuedSynchronizer.java:526)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:183)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
at
org.infinispan.util.concurrent.BufferedConcurrentHashMap$Segment.remove(BufferedConcurrentHashMap.java:745)
at
org.infinispan.util.concurrent.BufferedConcurrentHashMap$Segment$LRU.execute(BufferedConcurrentHashMap.java:832)
at
org.infinispan.util.concurrent.BufferedConcurrentHashMap$Segment.put(BufferedConcurrentHashMap.java:659)
at
org.infinispan.util.concurrent.BufferedConcurrentHashMap.putIfAbsent(BufferedConcurrentHashMap.java:1421)
at
com.bigdata.cache.BCHMGlobalLRU$InnerCacheImpl.putIfAbsent(BCHMGlobalLRU.java:633)
at
com.bigdata.cache.BCHMGlobalLRU$InnerCacheImpl.putIfAbsent(BCHMGlobalLRU.java:557)
at com.bigdata.btree.AbstractBTree.readNodeOrLeaf(AbstractBTree.java:3624)
________________________________________
From: Bryan Thompson
Sent: Wednesday, February 03, 2010 5:04 PM
To: Vladimir Blagojevic
Cc: infinispan -Dev List
Subject: RE: [infinispan-dev] Initial LIRS implementation
Vladimir,
Does this match up with what you are seeing? That is a good win for get()/remove() with
a loss on put().
Size = 14698
Performance for container ConcurrentHashMap
Average get ops/ms 13931
Average put ops/ms 306
Average remove ops/ms 294
Size = 452
Performance for container BufferedConcurrentHashMap(LRU)
Average get ops/ms 7735
Average put ops/ms 140
Average remove ops/ms 200
Size = 494
Performance for container BufferedConcurrentHashMap(LIRS)
Average get ops/ms 13400
Average put ops/ms 36
Average remove ops/ms 1136
Bryan
> -----Original Message-----
> From: Vladimir Blagojevic [mailto:vblagoje@redhat.com]
> Sent: Wednesday, February 03, 2010 3:36 PM
> To: Bryan Thompson
> Subject: Re: [infinispan-dev] Initial LIRS implementation
>
> Yes,
>
> Have a look at LRU first and leave LIRS for tomorrow :) I'd
> be happiest if you can find a use case for BCHM+LRU in your
> application and test it out that way. Real life scenario!
>
> Cheers
>
> On 2010-02-03, at 3:27 PM, Bryan Thompson wrote:
>
>> Ah. It is inside the same outer class? Bryan
>>
>>
>>> -----Original Message-----
>>> From: Vladimir Blagojevic [mailto:vblagoje@redhat.com]
>>> Sent: Wednesday, February 03, 2010 2:44 PM
>>> To: Bryan Thompson
>>> Subject: Re: [infinispan-dev] Initial LIRS implementation
>>>
>>> Here it is:
>>>
http://fisheye.jboss.org/browse/Infinispan/trunk/core/src/main
>>> /java/org/infinispan/util/concurrent/BufferedConcurrentHashMap.java
>>>
>>> Are you familiar with LIRS? If not, don't bother unless you are
>>> willing to dedicate at least a day or two :(
>>>
>>>
http://www.ece.eng.wayne.edu/~sjiang/Projects/LIRS/sig02.ppt
>>>
http://citeseer.ist.psu.edu/527790.html
>>>
>>> Regards,
>>> Vladimir
>>>
>>>
>>> On 2010-02-03, at 2:31 PM, Bryan Thompson wrote:
>>>
>>>> Can you send me the full class name and I will check it out. Bryan
>>>>
>>>>> -----Original Message-----
>>>>> From: infinispan-dev-bounces(a)lists.jboss.org
>>>>> [mailto:infinispan-dev-bounces@lists.jboss.org] On Behalf
>>> Of Vladimir
>>>>> Blagojevic
>>>>> Sent: Wednesday, February 03, 2010 2:09 PM
>>>>> To: infinispan -Dev List
>>>>> Subject: [infinispan-dev] Initial LIRS implementation
>>>>>
>>>>> Hi,
>>>>>
>>>>> I've just committed preliminary attempt to implement LIRS.
>>>>> There does not seem to be serious degradation in terms of
>>> performance
>>>>> when it comes to get/remove commands in comparison with
>>> LRU enabled
>>>>> BufferedConcurrentHashMap.
>>>>> However, put command is about three times slower than in
>>> LRU; put in
>>>>> LIRS is as fast as put command of a single lock
>>> synchronized HashMap.
>>>>>
>>>>> Looking for enthusiasts willing to help out with some code
>>> review :)
>>>>>
>>>>> Regards,
>>>>> Vladimir
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev(a)lists.jboss.org
>>>>>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>
>>>
>
>