Vladimir,
Sure. I will keep you updated. We have several of our own implementations and we have
been evaluating an implementation based on the LRUDataContainer, which has a wicked hot
spot as I am sure you know. This batching approach sounds quite promising. We will try
it on our existing implementations and we can evaluate (or help on) the infinispan
version.
Bryan
________________________________
From: infinispan-dev-bounces(a)lists.jboss.org
[mailto:infinispan-dev-bounces@lists.jboss.org] On Behalf Of Vladimir Blagojevic
Sent: Friday, January 08, 2010 10:08 AM
To: infinispan -Dev List
Cc: Martyn Cutcher
Subject: Re: [infinispan-dev] Implementing LIRS eviction policy [ISPN-299]
Hey Bryan,
On 2010-01-08, at 1:35 PM, Bryan Thompson wrote:
Vladimir,
Reading the paper, it appears that my suggestion of segments is called "distributed
locks" in this paper and the authors suggests:
For example, the algorithms that need to detect sequence of accesses
cannot retain their performance advantages when pages in the same sequence have been
distributed into multiple partitions and cannot be identified as a sequence.
Indeed, but this sequence access seems to be important for subgroup of algorithms like
SEQ. To be honest, when I first read the paper I was surprised by the authors claim that
batching does not affect precision of LIRS and LRU. LIRS and LRU need order of access too
by it seems that approximated access is not affecting precision significantly.
Along the same lines we have to figure out if segmenting and batching per segment is going
to affect algorithm precision overall. If someone can prove that this is dead end please
argue your point :) Otherwise, intuitively speaking, I cannot see how batching per segment
would suddenly make LIRS imprecise if batching did not do so in the first place.
Overall, I like the approach outlined in the paper. It takes the concept of batching
further than what we did, but in much the same spirit. In our implementation, we only
batched evictions. They are also batching the updates to the replacement policy ordering
which is still a bottleneck for us.
Concerning the FIFO queue, the paper (page 372) considers both per-thread queues or global
queues. They dismiss global queues for batching updates primarily because certain cache
replacement policies need to detect sequences of operations. Can such sequences of
operations be meaningfully detected within infinispan? Is there any thread-affinity for a
transaction? If not, then segmented queues might be an alternative for infinispan.
Yes, they dismiss global queue over algorithms we do not plan to use and locking cost of a
global queue. I am betting that global lock free queue per Infinispan DataContainer is
less costly than shared BP-Wrapper objects that we have to maintain in a pool and share
among threads (they are not shared in the paper but we have to). As I mentioned the
problem for us is that we can potentially have spikes of up to hundreds of threads in a
cache instance. Infinispan use case differs from PostgreSQL used in the paper but I agree
that potential impact of global lock free queue should not be dismissed and I would switch
to pooling approach if it is proven that lock free queue causes bottleneck.
Also, for java, if the hash map uses weak values then cache entries then any entries on
the FIFO queues will remain pinned in the hash map even if they would have otherwise been
evicted by the cache policy which is a nice touch. That way you can continue to access an
object which remains accessible to the JVM held even if the cache replacement policy would
have discarded the cache entry.
The experiment in the paper compared several cache replacement policies, but the data were
all hot in cache and the cache was large enough to avoid all cache misses, so I am not
clear why they bothered to compare different cache replacement policies -- none of them
did any evictions for their tests. While "prefetching" did reduce the lock
contention slightly, it had no meaningful effect on either throughput or response time.
The authors posit that it might have more effect on machines with many more cores,
depending on the CPU architecture. It seems easily enough to ignore the prefetch
dimension for now and just concentrate on batching.
Exactly the same sentiment here :)
So, I think that the batching approach per this paper makes sense and could be combined
with LRU or LIRS easily since the cache replacement policy is encapsulated (which might
not be true with prefetch) and seems overall like a good idea and an improvement on what
we were already doing. For our application, I think that we would use a hash map with
weak values (per above) to gain the benefit of the actual retention time of the object by
the JVM. I would also like to explore the use of segmented queues as well as per-thread
queues to control the #of queues in the system separately from the #of threads. By
batching updates to the access order, you are also effectively batching evictions since
they are driven by updates. When combined with a weak value caches, the thread running
the updates should also clear entries from the cache on the weak reference queue, thereby
batching the clearing of stale cache entries (those with cleared references) at the same
time it is batching updates to the access order and inserts to the cache.
Bryan
Excellent, lets keep cooperating on this one. If you are already in implementation stage,
and it seems that you are, let us know of the progress you make and performance increases
you observe.
Regards,
Vladimir