Valdimir,
I have used a similar approach to "batching" evictions with good success. It
reduced contention for the lock to handle eviction by about twenty percent.
For this cache, which was not realized using infinispan, I did NO eviction until the cache
reached a target percentage of the JVM heap (this kept the cache hot). For our case, this
was easy as the objects entered in the cache were database records coded as byte[]s so
have a good measure for their memory footprint. Once adding another record to the cache
would push the cache size over the target memory limit, that thread would gain a lock and
then evict a batch of records (in an LRU fashion) until the cache size had been driven
down by a threashold (e.g., 5% of the target memory was recovered).
Let me suggest an alternative approach. Neither LIRS nor LRU need to be strong orderings.
It is enough that a cache eviction policy be "approximately" LRU (or LIRS). As
long as the least desirable entries wind up getting evicted, it does not matter what the
precise eviction order is.
If you accept "approximate" eviction orders, then striped locks could be used to
reduce thread contention. Evictions could be driven on each segment of the cache
independently. If we hash the cache entry key to choose the segment, then each cache
entry will only appear in one segment and all segments should be driven more or less
equally by the application workload (assuming a decent hash function). So using 16
segments / locks, you have 1/16th the contention.
Thoughts?
Bryan
________________________________
From: infinispan-dev-bounces(a)lists.jboss.org
[mailto:infinispan-dev-bounces@lists.jboss.org] On Behalf Of Vladimir Blagojevic
Sent: Thursday, January 07, 2010 6:55 AM
To: infinispan -Dev List
Subject: [infinispan-dev] Implementing LIRS eviction policy [ISPN-299]
Hey everyone,
Some people are already familiar with this thread. They can jump towards the end of email
to read a concrete proposal on how to implement LIRS in Infinispan. Others, those of you
interested obscure eviction algorithms, keep reading :)
Some time ago Manik asked me to look into implementation of a new, LIRS algorithm for
cache eviction. It is a well known fact that plain vanilla LRU algorithm, although simple
and easy to understand, under performs in cases of weak access locality (one time access
pages are not replaced timely, pages to be accessed soonest are unfortunately replaced,
and so on). There has been a new algorithm out there that is rather popular called LIRS
that addresses these shortcomings of LRU yet it retains LRU's simplicity.
That is where the easy part ends. Eviction algorithm, if not implemented in scalable and
lock free fashion can seriously degrade performance. Having a lock protected data
container (to use Infinispan lingo) causes high contention offsetting eviction precision
that we get by using algorithm such as LIRS. That set me off on to a search for
LinkedHashMap (most suitable for LIRS and LRU) like structure that is lock free. Ben
Manes, recently employed by Google, has been working on this problem for a while. His
first attempt to implement ConcurrentLinkedHashMap had a flaw that was discovered by
EhCache people and confirmed by Manik in his own test. Ben Manes' second design for
ConcurrentLinkedHashMap uses ideas from a well known seminal paper in the area of lock
free algorithms [1] and the new design looks valid, at least to me. His implementation of
ConcurrentLinkedHashMap is not finished yet.
However, even if we had ConcurrentLinkedHashMap today that puts us only half way from our
lock free LIRS implementation. LIRS does not use only one stack/list such as LRU but two.
LIRS, in some cases, performs a lot of node shifting within that list and transfers nodes
from one list to another. Manik and I talked about how we could potentially change
original LIRS and stick the whole thing into one stack (ConcurrentLinkedHashMap) by using
additional node markings and such. Overall, I think this is possible but full of potential
pitfalls.
Just before holidays while bashing Google scholar day after day I came across a research
paper [2] that I would say has a lot of potential, not only for our LIRS eviction data
container implementation but any other eviction algorithm implementation.
Instead of making a trade-off between the high hit ratio of an eviction algorithm and the
low lock contention there is a third way, and dare I say a excellent idea of lock
amortization. We can wrap any eviction algorithm with a framework that keeps track of
cache access per thread (ThreadLocal) in a simple data container, say a queue. For each
cache hit associated with a thread, the access history information is recorded in the
thread's queue. When thread's queue is full or the number of accesses recorded in
the queue reaches a certain pre-determined threshold, we acquire a lock and then execute
the operations defined by the eviction algorithm - once for all the accesses in the queue.
A thread is allowed to access many cache items without requesting a lock to run the
eviction replacement algorithm, or without paying the lock acquisition cost. We can fully
exploit a non-blocking lock APIs like tryLock. As you recall tryLock makes an attempt to
get the lock and if the lock is currently held by another thread, it fails without
blocking its caller thread. Although tryLock is cheap it is not used for every cache
access for obvious reasons but rather on certain pre-determined thresholds. In case when
thread's queue is totally full lock must be explicitly requested. Intuitively speaking
this makes a lot of sense, we significantly lower the cost of lock contention,
order/streamline access to locked structures, retain the precision of eviction algorithm
such as LIRS, and best of all, if we are to believe to the authors claim, we can increase
throughput by nearly two-fold compared to the implementation of an unmodified eviction
algorithm, such as LRU, and at the same time achieve a scalability as good as the one that
use lock free structures.
So how do we translate these ideas in Infinispan?
In order to implement data container with batching lock amortization updates,
DataContainer is structured so that it contains two DataContainers in a chain. As far as
Infinispan code base is concerned DataContainer interface is still exposed as is but the
implementation of the first DataContainer in the chain contains a references to a delegate
- real DataContainer implementation. The first DataContainer in the chain is considered to
be a lock free buffer data container (BDC) while delegate container is thread safe and
interchangeable (LIRS, LRU) data container (DC). BDC has a ConcurrentHashMap whose cache
entry contents are managed as calls are unwound from DC.
As previously discussed BP-Wrapper [1] shared objects are used to batch updates to DC.
BP-Wrappers are envisioned as per thread objects having its own queue to record cache
entry accesses. As Brian mentioned this might be perfectly fine in other systems but it
will present a problem in Infinispan where we can potentially have hundreds of concurrent
threads accessing single data container. Many short lived threads would never fill up
their queue enough to hit a threshold. Manik, suggested that we share BP-Wrapper objects
in a pool among all InvocationContext(s).
XYZInterceptor {
Pool<BP-Wrapper> pool;
// grab BP-Wrapper off pool
// assign to InvocationContext
// try {pass up chain}
// finally {pull off InvocationContext, return to pool}
}
However, after thinking this through a bit more, a better solutions seems to be recording
all cache entry accesses in a lock-free queue within BDC itself. All threads making
invocations into DC share one lock-free queue to record cache entry accesses instead of
having one queue per BP-Wrapper object. In this case, we do not have to manage shared
BP-Wrapper objects, we do not need an extra interceptor and so on.
In order to batch updates to DC we need to "commit" all accessed cache entries
into DC. As of now we do not have such API. Either we introduce a subclass of
DataContainer that has a following new method or we can extend current DataContainer and
make all implementations of DC that do not handle batch updates perform a no-op:
public touch(List<InternalCacheEntry> updates)
Feedback appreciated.
Regards,
Vladimir
[1]
http://www.cl.cam.ac.uk/research/srg/netos/papers/2001-caslists.pdf
[2]
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-09-1.pdf