[infinispan-dev] Implementing LIRS eviction policy [ISPN-299]

Thu Jan 7 06:54:55 EST 2010

Hey everyone,

Some people are already familiar with this thread. They can jump towards the end of email to read a concrete proposal on how to implement LIRS in Infinispan. Others, those of you interested obscure eviction algorithms, keep reading :)

Some time ago Manik asked me to look into implementation of a new, LIRS algorithm for cache eviction. It is a well known fact that plain vanilla LRU algorithm, although simple and easy to understand, under performs in cases of weak access locality (one time access pages are not replaced timely, pages to be accessed soonest are unfortunately replaced, and so on). There has been a new algorithm out there that is rather popular called LIRS that addresses these shortcomings of LRU yet it retains LRU's simplicity.

That is where the easy part ends. Eviction algorithm, if not implemented in scalable and lock free fashion can seriously degrade performance. Having a lock protected data container (to use Infinispan lingo) causes high contention offsetting eviction precision that we get by using algorithm such as LIRS. That set me off on to a search for LinkedHashMap (most suitable for LIRS and LRU) like structure that is lock free. Ben Manes, recently employed by Google, has been working on this problem for a while. His first attempt to implement ConcurrentLinkedHashMap had a flaw that was discovered by EhCache people and confirmed by Manik in his own test. Ben Manes' second design for ConcurrentLinkedHashMap uses ideas from a well known seminal paper in the area of lock free algorithms [1] and the new design looks valid, at least to me. His implementation of ConcurrentLinkedHashMap is not finished yet. 

However, even if we had ConcurrentLinkedHashMap today that puts us only half way from our lock free LIRS implementation. LIRS does not use only one stack/list such as LRU but two. LIRS, in some cases, performs a lot of node shifting within that list and transfers nodes from one list to another. Manik and I talked about how we could potentially change original LIRS and stick the whole thing into one stack (ConcurrentLinkedHashMap) by using additional node markings and such. Overall, I think this is possible but full of potential pitfalls.

Just before holidays while bashing Google scholar day after day I came across a research paper [2] that I would say has a lot of potential, not only for our LIRS eviction data container implementation but any other eviction algorithm implementation.

Instead of making a trade-off between the high hit ratio of an eviction algorithm and the low lock contention there is a third way, and dare I say a excellent idea of lock amortization. We can wrap any eviction algorithm with a framework that keeps track of cache access per thread (ThreadLocal) in a simple data container, say a queue. For each cache hit associated with a thread, the access history information is recorded in the thread’s queue. When thread's queue is full or the number of accesses recorded in the queue reaches a certain pre-determined threshold, we acquire a lock and then execute the operations defined by the eviction algorithm - once for all the accesses in the queue. A thread is allowed to access many cache items without requesting a lock to run the eviction replacement algorithm, or without paying the lock acquisition cost. We can fully exploit a non-blocking lock APIs like tryLock. As you recall tryLock makes an attempt to get the lock and if the lock is currently held by another thread, it fails without blocking its caller thread. Although tryLock is cheap it is not used for every cache access for obvious reasons but rather on certain pre-determined thresholds. In case when thread's queue is totally full lock must be explicitly requested. Intuitively speaking this makes a lot of sense, we significantly lower the cost of lock contention, order/streamline access to locked structures, retain the precision of eviction algorithm such as LIRS, and best of all, if we are to believe to the authors claim, we can increase throughput by nearly two-fold compared to the implementation of an unmodified eviction algorithm, such as LRU, and at the same time achieve a scalability as good as the one that use lock free structures.

So how do we translate these ideas in Infinispan?
In order to implement data container with batching lock amortization updates, DataContainer is structured so that it contains two DataContainers in a chain. As far as Infinispan code base is concerned DataContainer interface is still exposed as is but the implementation of the first DataContainer in the chain contains a references to a delegate – real DataContainer implementation. The first DataContainer in the chain is considered to be a lock free buffer data container (BDC) while delegate container is thread safe and interchangeable (LIRS, LRU) data container (DC). BDC has a ConcurrentHashMap whose cache entry contents are managed as calls are unwound from DC.
As previously discussed BP-Wrapper [1] shared objects are used to batch updates to DC. BP-Wrappers are envisioned as per thread objects having its own queue to record cache entry accesses. As Brian mentioned this might be perfectly fine in other systems but it will present a problem in Infinispan where we can potentially have hundreds of concurrent threads accessing single data container. Many short lived threads would never fill up their queue enough to hit a threshold. Manik, suggested that we share BP-Wrapper objects in a pool among all InvocationContext(s).  

XYZInterceptor {
  Pool<BP-Wrapper> pool;

 // grab BP-Wrapper off pool
 // assign to InvocationContext
 // try {pass up chain}
 // finally {pull off InvocationContext, return to pool}
}
However, after thinking this through a bit more, a better solutions seems to be recording all cache entry accesses in a lock-free queue within BDC itself. All threads making invocations into DC share one lock-free queue to record cache entry accesses instead of having one queue per BP-Wrapper object. In this case, we do not have to manage shared BP-Wrapper objects, we do not need an extra interceptor and so on.
In order to batch updates to DC we need to “commit” all accessed cache entries into DC. As of now we do not have such API. Either we introduce a subclass of DataContainer that has a following new method or we can extend current DataContainer and make all implementations of DC that do not handle batch updates perform a no-op:
public touch(List<InternalCacheEntry> updates)

Feedback appreciated. 

Regards,
Vladimir

[1] http://www.cl.cam.ac.uk/research/srg/netos/papers/2001-caslists.pdf
[2] http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-09-1.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20100107/496591a7/attachment-0002.html