<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16945" name=GENERATOR></HEAD>
<BODY
style="WORD-WRAP: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space">
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4>Vladimir,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4>I now have a workload where I have a hotspot on our cache
which is 8% of the total time. I am going to use this to test BCHM under a
realistic workload.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4>One thing which I like about your design is that by placing
the array buffering the operations for a thread within the Segment, you are
using the lock guarding the Segment to control updates to that array.
While this has less potential throughput than a truely thread-local design
(which is non-blocking until the thread-local array is full), it seems that the
array buffering the updates can not "escape" under your design. Was this
intentional?</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4>The danger with a true thread-local design is that a thread
can come in and do some work, get some updates buffered in its thread-local
array, and then never visit again. In this case those updates would remain
buffered on the thread and would not in fact cause the access order to be
updated in a timely manner. Worse, if you are relying on WeakReference
semantics, the buffered updates would remain strongly reachable and the
corresponding objects would be wired into the cache.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4>I've worked around this issue in a different context where
it made sense to scope the buffers to an owning object (a B+Tree
instance). In that case, when the B+Tree container was closed, all
buffered updates were discarded. This nicely eliminated the problems with
"escaping" threads.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4>However, I like your approach better.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=131211200-03022010><FONT face="Courier New"
color=#0000ff size=4>Bryan</FONT></SPAN></DIV>
<DIV><FONT face="Courier New" color=#0000ff size=4></FONT> </DIV>
<DIV><SPAN class=131211200-03022010><FONT face="Courier New" color=#0000ff
size=4>PS: I see a comment on LRU#onEntryHit() indicating that it is invoked
without holding the Segment lock. Is that true only when BCHM#get() is
invoked?</FONT></SPAN></DIV>
<DIV><SPAN class=131211200-03022010><FONT face="Courier New" color=#0000ff
size=4></FONT></SPAN> </DIV>
<DIV><SPAN class=131211200-03022010><FONT face="Courier New" color=#0000ff
size=4>PPS: Also, can you expand on the role of the LRU#accessQueue vs
LRU#lruQueue? Is there anything which corresponds to the total LRU order
(after batching updates through the Segment)?</FONT></SPAN></DIV>
<DIV><SPAN class=131211200-03022010><FONT face="Courier New" color=#0000ff
size=4></FONT></SPAN> </DIV>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> infinispan-dev-bounces@lists.jboss.org
[mailto:infinispan-dev-bounces@lists.jboss.org] <B>On Behalf Of </B>Bryan
Thompson<BR><B>Sent:</B> Wednesday, January 27, 2010 1:00 PM<BR><B>To:</B>
infinispan -Dev List<BR><B>Subject:</B> Re: [infinispan-dev] Lock amortization
preliminary performance numbers<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4>Vladimir,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4>I recently replaced a synchronized
ring buffer behind a ConcurrentHashMap with weak values with a variation of
the ring buffer which buffers updates in thread-local arrays and the batches
them through the lock. For the specific application (a cache of
frequently resolved terms in a dictionary) contention went to zero. We
use a very similar design with a ring buffer to hold onto recently touched
B+Tree nodes and leaves. This was a hot spot for concurrent readers as
they needed to synchronize when adding the reference to the ring buffer.
I made a similar change, to use a variant of the ring buffer with thread-local
buffers and batching through the lock, and that contention has also
disappeared completely. I have been using between a 64 and 128 element
per-thread array with barging into the lock at 50% of the buffer
capacity.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4>So, overall, this approach is working
out very nicely for us in combination with WeakReferences.
</FONT></SPAN><SPAN class=923295017-27012010><FONT face="Courier New"
color=#0000ff size=4>One drawback of the approach is that hard references
remain in the thread-local arrays until they are batched through the
lock. Depending on how threads are managed, it is possible for hard
references to remain in the thread local queues long past their last
touch. For this reason we do not use this technique when the cache
contains expensive resources where we need a better guarantee of timely
discard when they resource is no longer in use by the application. It
might be possible to address this concern with a protocol to attach and detach
the thread-local buffers from threads on their way into / out of a thread
pool.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4>I have not yet tried to use this
approach in our main data record cache (caching data records read from the
disk in memory). We have a hot spot there still, but I am quite certain
that we will be able to address it using similar techniques. I will
try out the infinispan BufferedConcurrentHashMap in that context
soon.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4>Thanks,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4>Bryan</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=923295017-27012010><FONT
face="Courier New" color=#0000ff size=4></FONT></SPAN> </DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> infinispan-dev-bounces@lists.jboss.org
[mailto:infinispan-dev-bounces@lists.jboss.org] <B>On Behalf Of </B>Vladimir
Blagojevic<BR><B>Sent:</B> Wednesday, January 27, 2010 10:30
AM<BR><B>To:</B> infinispan -Dev List<BR><B>Subject:</B> [infinispan-dev]
Lock amortization preliminary performance numbers<BR></FONT><BR></DIV>
<DIV></DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px">Hi all,</SPAN></FONT>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px">As you probably recall I am
working on a LIRS eviction algorithm. But before we get there, Manik and I
agreed that I push in a direction of implementing ConcurrentHashMap(CHM)
variant that will do lock amortization and eviction per segment of CHM. So
far I have implemented LRU eviction so we can verify feasibility of this
approach rather than eviction algorithm itself. We are hoping that eviction
algorithm precision will not suffer if we do eviction per segment and you
all are familiar with lock striping benefits of segments in
CHM. </SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px">So, I've cobbled up a first
test to compare eviction and lock amortized enabled CHM (BCHM) with regular
CHM and synchronized HashMap. The test is actually based on already existing
test [1] used to measure performance of various DataContainers. I've changed
it slightly to measure Map directly instead of DataContainer. The test
launches in cohort 48 reader, 4 writer and 1 remover threads. All operations
are randomized; readers execute map.get(R.nextInt(NUM_KEYS),
writers map.put(R.nextInt(NUM_KEYS), "value"), and finally removers
execute map.remove(R.nextInt(NUM_KEYS). NUM_KEYS was set to 10K. Each
thread does in a loop 1K ops. </SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><SPAN class=Apple-style-span
style="FONT-SIZE: 12px; FONT-FAMILY: Arial">Initial capacity
for CHM and HashMap were set to 1K, max capacity for eviction
and lock amortized enabled CHM was set to 256; therefore BCHM has to do a
lot of evictions which is evident in map final size listed
below.</SPAN></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px">
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'"><FONT
class=Apple-style-span face=Arial size=3><SPAN class=Apple-style-span
style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'"><FONT
class=Apple-style-span face=Arial size=3><SPAN class=Apple-style-span
style="FONT-SIZE: 12px">
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Size = 9999</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Performance for
container ConcurrentHashMap</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Average get ops/ms
338</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Average put ops/ms
87</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Average remove ops/ms
171</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'"><BR></DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Size = 193</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Performance for
container BufferedConcurrentHashMap</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Average get ops/ms
322</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Average put ops/ms
32</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Average remove ops/ms
74</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'"><BR></DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Size = 8340</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Performance for
container SynchronizedMap</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Average get ops/ms
67</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Average put ops/ms
45</DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'">Average remove ops/ms
63</DIV></DIV></DIV></DIV>
<DIV
style="MARGIN: 0px; FONT: 11px 'Lucida Grande'"><BR></DIV></SPAN></FONT></DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'"><FONT
class=Apple-style-span face=Arial size=3><SPAN class=Apple-style-span
style="FONT-SIZE: 12px">If I remove lock amortization for <SPAN
class=Apple-style-span
style="FONT-SIZE: 11px; FONT-FAMILY: 'Lucida Grande'">BufferedConcurrentHashMap,
that is if we attempt eviction on every get/put/remove, performance of
put/remove for BCHM goes to zero basically!</SPAN> As far as I can
interpret these results, BCHM get operation performance does not suffer at
all in comparison with CHM as it does for single lock HashMap. Predictably,
for single lock HashMap each type of operation takes on avg almost
the same amount of time. We pay a hit in BCHM put/remove operations in
comparison with CHM but the numbers are promising, I think. If you
have comments, or even suggestions on how to test these map variants in a
different, possibly more insightful approach, speak
up! </SPAN></FONT></DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'"><FONT
class=Apple-style-span face=Arial size=3><SPAN class=Apple-style-span
style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'"><FONT
class=Apple-style-span face=Arial size=3><SPAN class=Apple-style-span
style="FONT-SIZE: 12px">Cheers,</SPAN></FONT></DIV>
<DIV style="MARGIN: 0px; FONT: 11px 'Lucida Grande'"><FONT
class=Apple-style-span face=Arial size=3><SPAN class=Apple-style-span
style="FONT-SIZE: 12px">Vladimir</SPAN></FONT></DIV></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px">[1] <A
href="http://fisheye.jboss.org/browse/~raw,r=1264/Infinispan/trunk/core/src/test/java/org/infinispan/stress/DataContainerStressTest.java">http://fisheye.jboss.org/browse/~raw,r=1264/Infinispan/trunk/core/src/test/java/org/infinispan/stress/DataContainerStressTest.java</A></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"><BR></SPAN></FONT></DIV>
<DIV><FONT class=Apple-style-span face=Arial size=3><SPAN
class=Apple-style-span style="FONT-SIZE: 12px"></SPAN></FONT> </DIV>
<DIV><BR></DIV></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>