Vladimir,
Reading the paper, it appears that my suggestion of
segments is called "distributed locks" in this paper and the authors
suggests:
> For
example, the algorithms that need to
detect sequence of accesses cannot retain their
performance advantages when pages in the same sequence have been distributed into multiple partitions
and cannot be identified as a
sequence.
Overall, I like the approach outlined in the paper.
It takes the concept of batching further than what we did, but in much the same
spirit. In our implementation, we only batched evictions. They are
also batching the updates to the replacement policy ordering which is still a
bottleneck for us.
Concerning the FIFO queue, the paper
(page 372) considers both per-thread queues or global queues. They dismiss
global queues for batching updates primarily because certain cache replacement
policies need to detect sequences of operations. Can such sequences of
operations be meaningfully detected within infinispan? Is there any
thread-affinity for a transaction? If not, then segmented queues might be
an alternative for infinispan.
Also, for java, if the
hash map uses weak values then cache entries then any entries on the FIFO queues
will remain pinned in the hash map even if they would have otherwise been
evicted by the cache policy which is a nice touch. That way you can
continue to access an object which remains accessible to the JVM held even if
the cache replacement policy would have discarded the cache
entry.
The experiment in the paper compared several cache
replacement policies, but the data were all hot in cache and the cache was large
enough to avoid all cache misses, so I am not clear why they bothered to compare
different cache replacement policies -- none of them did any evictions for their
tests. While "prefetching" did reduce the lock
contention slightly, it had no meaningful effect on either throughput or
response time. The authors posit that it might have more effect on
machines with many more cores, depending on the CPU architecture. It seems
easily enough to ignore the prefetch dimension for now and just concentrate on
batching.
So, I think that the batching approach per this paper makes
sense and could be combined with LRU or LIRS easily since the cache replacement
policy is encapsulated (which might not be true with prefetch) and seems overall
like a good idea and an improvement on what we were already doing. For our
application, I think that we would use a hash map with weak values (per above)
to gain the benefit of the actual retention time of the object by the JVM.
I would also like to explore the use of segmented queues as well as per-thread
queues to control the #of queues in the system separately from the #of
threads. By batching updates to the access order, you are also effectively
batching evictions since they are driven by updates. When combined with a
weak value caches, the thread running the updates should also clear entries from
the cache on the weak reference queue, thereby batching the clearing of stale
cache entries (those with cleared references) at the same time it is batching
updates to the access order and inserts to the cache.
Bryan
Hey Bryan,
On 2010-01-08, at 2:09 AM, Bryan Thompson wrote:
Valdimir,
I have used a similar approach to
"batching" evictions with good success. It reduced contention for the
lock to handle eviction by about twenty
percent.
How different, if at all, was your batching implementation from the one I
referred to in my previous mail?
For this cache, which was not
realized using infinispan, I did NO eviction until the cache reached a
target percentage of the JVM heap (this kept the cache hot). For our
case, this was easy as the objects entered in the cache were database
records coded as byte[]s so have a good measure for their memory
footprint. Once adding another record to the cache would push the
cache size over the target memory limit, that thread would gain a lock and
then evict a batch of records (in an LRU fashion) until the cache size had
been driven down by a threashold (e.g., 5% of the target memory was
recovered).
Let me suggest an alternative
approach. Neither LIRS nor LRU need to be strong orderings. It
is enough that a cache eviction policy be "approximately" LRU (or
LIRS). As long as the least desirable entries wind up getting evicted,
it does not matter what the precise eviction order
is.
Can you clarify "need to be strong orderings"? Aha, you are referring to
eviction order?
If you accept "approximate" eviction
orders, then striped locks could be used to reduce thread contention.
Evictions could be driven on each segment of the cache independently.
If we hash the cache entry key to choose the segment, then each cache entry
will only appear in one segment and all segments should be driven more or
less equally by the application workload (assuming a decent hash
function). So using 16 segments / locks, you have 1/16th the
contention.
Thoughts?
So what you are basically saying is that we implement LIRS and LRU per
Segment of ConcurrentHashMap, is that right? Interesting
idea!
Does your proposal still contain batching? If so, how would you fit it in
this new design? You would have to keep batched record of cache entry accesses
in a queue per segment?
Regards,
Vladimir