Hey everyone,
Some people are already familiar with this thread. They can jump towards the end of email
to read a concrete proposal on how to implement LIRS in Infinispan. Others, those of you
interested obscure eviction algorithms, keep reading :)
Some time ago Manik asked me to look into implementation of a new, LIRS algorithm for
cache eviction. It is a well known fact that plain vanilla LRU algorithm, although simple
and easy to understand, under performs in cases of weak access locality (one time access
pages are not replaced timely, pages to be accessed soonest are unfortunately replaced,
and so on). There has been a new algorithm out there that is rather popular called LIRS
that addresses these shortcomings of LRU yet it retains LRU's simplicity.
That is where the easy part ends. Eviction algorithm, if not implemented in scalable and
lock free fashion can seriously degrade performance. Having a lock protected data
container (to use Infinispan lingo) causes high contention offsetting eviction precision
that we get by using algorithm such as LIRS. That set me off on to a search for
LinkedHashMap (most suitable for LIRS and LRU) like structure that is lock free. Ben
Manes, recently employed by Google, has been working on this problem for a while. His
first attempt to implement ConcurrentLinkedHashMap had a flaw that was discovered by
EhCache people and confirmed by Manik in his own test. Ben Manes' second design for
ConcurrentLinkedHashMap uses ideas from a well known seminal paper in the area of lock
free algorithms [1] and the new design looks valid, at least to me. His implementation of
ConcurrentLinkedHashMap is not finished yet.
However, even if we had ConcurrentLinkedHashMap today that puts us only half way from our
lock free LIRS implementation. LIRS does not use only one stack/list such as LRU but two.
LIRS, in some cases, performs a lot of node shifting within that list and transfers nodes
from one list to another. Manik and I talked about how we could potentially change
original LIRS and stick the whole thing into one stack (ConcurrentLinkedHashMap) by using
additional node markings and such. Overall, I think this is possible but full of potential
pitfalls.
Just before holidays while bashing Google scholar day after day I came across a research
paper [2] that I would say has a lot of potential, not only for our LIRS eviction data
container implementation but any other eviction algorithm implementation.
Instead of making a trade-off between the high hit ratio of an eviction algorithm and the
low lock contention there is a third way, and dare I say a excellent idea of lock
amortization. We can wrap any eviction algorithm with a framework that keeps track of
cache access per thread (ThreadLocal) in a simple data container, say a queue. For each
cache hit associated with a thread, the access history information is recorded in the
thread’s queue. When thread's queue is full or the number of accesses recorded in the
queue reaches a certain pre-determined threshold, we acquire a lock and then execute the
operations defined by the eviction algorithm - once for all the accesses in the queue. A
thread is allowed to access many cache items without requesting a lock to run the eviction
replacement algorithm, or without paying the lock acquisition cost. We can fully exploit a
non-blocking lock APIs like tryLock. As you recall tryLock makes an attempt to get the
lock and if the lock is currently held by another thread, it fails without blocking its
caller thread. Although tryLock is cheap it is not used for every cache access for obvious
reasons but rather on certain pre-determined thresholds. In case when thread's queue
is totally full lock must be explicitly requested. Intuitively speaking this makes a lot
of sense, we significantly lower the cost of lock contention, order/streamline access to
locked structures, retain the precision of eviction algorithm such as LIRS, and best of
all, if we are to believe to the authors claim, we can increase throughput by nearly
two-fold compared to the implementation of an unmodified eviction algorithm, such as LRU,
and at the same time achieve a scalability as good as the one that use lock free
structures.
So how do we translate these ideas in Infinispan?
In order to implement data container with batching lock amortization updates,
DataContainer is structured so that it contains two DataContainers in a chain. As far as
Infinispan code base is concerned DataContainer interface is still exposed as is but the
implementation of the first DataContainer in the chain contains a references to a delegate
– real DataContainer implementation. The first DataContainer in the chain is considered to
be a lock free buffer data container (BDC) while delegate container is thread safe and
interchangeable (LIRS, LRU) data container (DC). BDC has a ConcurrentHashMap whose cache
entry contents are managed as calls are unwound from DC.
As previously discussed BP-Wrapper [1] shared objects are used to batch updates to DC.
BP-Wrappers are envisioned as per thread objects having its own queue to record cache
entry accesses. As Brian mentioned this might be perfectly fine in other systems but it
will present a problem in Infinispan where we can potentially have hundreds of concurrent
threads accessing single data container. Many short lived threads would never fill up
their queue enough to hit a threshold. Manik, suggested that we share BP-Wrapper objects
in a pool among all InvocationContext(s).
XYZInterceptor {
Pool<BP-Wrapper> pool;
// grab BP-Wrapper off pool
// assign to InvocationContext
// try {pass up chain}
// finally {pull off InvocationContext, return to pool}
}
However, after thinking this through a bit more, a better solutions seems to be recording
all cache entry accesses in a lock-free queue within BDC itself. All threads making
invocations into DC share one lock-free queue to record cache entry accesses instead of
having one queue per BP-Wrapper object. In this case, we do not have to manage shared
BP-Wrapper objects, we do not need an extra interceptor and so on.
In order to batch updates to DC we need to “commit” all accessed cache entries into DC. As
of now we do not have such API. Either we introduce a subclass of DataContainer that has a
following new method or we can extend current DataContainer and make all implementations
of DC that do not handle batch updates perform a no-op:
public touch(List<InternalCacheEntry> updates)
Feedback appreciated.
Regards,
Vladimir
[1]
http://www.cl.cam.ac.uk/research/srg/netos/papers/2001-caslists.pdf
[2]
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-09-1.pdf