[infinispan-dev] [infinispan-internal] Performance based on contention
Sanne Grinovero
sanne at infinispan.org
Fri Jan 4 12:07:09 EST 2013
On 4 January 2013 17:10, Radim Vansa <rvansa at redhat.com> wrote:
> | The effect on near-CPU caches is an interesting hypothesis, it would
> | be nice to measure it; if you happen to run them again, could you use
> | perf ?
> | Last time I had run it - almost a year ago - the counters had shown
> | very large opportunities for improvements (a nice way to say it was a
> | disaster), so I guess any change in the locking strategy could have
> | improved that.
>
> Can you specify what kind of opportunities have you seen from performance counters? Like too many context switches, or many cache misses? I don't think we have the "luxury" of being able to reorganize memory structures in Java, do we?
It pointed out we have a huge amount of L1 and L2 misses, I forgot the
figures but looking for comparisons in the Intel tech documentation
for i7 processors,
my values seemed to be orders of magnitude higher than what seems to
be suggested as a threshold for "you have a problem".
Sadly the tool doesn't point you to which area of code is problematic,
it just outputs average figures for the whole run; it yells "you have
a problem" but doesn't give you a clue.
You're right we don't have the luxury in Java to control this
directly, but there are some tricks we could try to apply such as
cache-transcendent algorithms, and there are structures which are
known to be cache friendly; for example even in Java it does matter in
which order you process cells in a multidimensional array: some
sequences are several orders of magnitude faster because they are more
"cache friendly".
>
> |
> | Would also be interesting to run both scenarios under OProfile, so we
> | can see which are is slower.
>
> I don't have any experience but recently I have tried to profile XS replication with JProfiler. Instrumentation slowed everything so much that the results made no sense, and sampling cannot prove anything as the methods execute too quickly. Have you used OProfile for ISPN yet, with any useful results?
Yes I used OProfile on Infinispan, exactly because as you say other
profilers introduce too much overhead: OProfile doesn't. It has been
very useful, but it was a year ago so repeating the investigation
could certainly be useful. I'm not an expert myself, that was my first
and last run: all know how I have was what I have found on the
jboss.org wiki, there is an explanation about how to use it for Java
applications.
>
> |
> | Let's not forget that there are many internal structures, so even
> | when
> | using different keys there are still plenty of contentions.. I just
> | wouldn't guess which ones without using the above mentioned tools ;-)
> |
> | thanks a lot,
> | Sanne
> |
> | On 4 January 2013 15:26, Radim Vansa <rvansa at redhat.com> wrote:
> | > |
> | > | As for the low contention cases, I think they're all about the
> | > | same
> | > | as the probability for contention is all very low given the
> | > | number
> | > | of threads used. Perhaps if these were cranked up to maybe 50 or
> | > | 100
> | > | threads, you'd see a bigger difference?
> | >
> | > Right, I think that there should be no difference between 24k and
> | > 80k, but as you can see in the no-tx case there was some strange
> | > behaviour so I added one more case to see the trend in the rest of
> | > results. Perflab was bored during xmas anyway ;-)
> | > Nevertheless the overall result is pretty surprising for me, as
> | > infinspan can handle contention so smoothly that except for really
> | > extreme cases (as number of keys == number of threads) contention
> | > rather improves the performance (Is there any explanation for
> | > this? Some memory caching stuff keeping the often accessed results
> | > still occupy near-CPU caches? Can this really affect such
> | > high-level thingie as ispn?).
> | >
> | > Radim
> | >
> | > | On 4 Jan 2013, at 09:23, Radim Vansa <rvansa at redhat.com> wrote:
> | > |
> | > | > Hi,
> | > | >
> | > | > I have created a comparison how JDG (library mode) behaves
> | > | > depending on the contention on keys. The test runs standard
> | > | > (20%
> | > | > puts, 80% gets) stress test on different amount of keys (while
> | > | > there are always 80k keys loaded into the cache) with 10
> | > | > concurrent threads on each of 8 nodes, for 10 minutes.
> | > | > Regarding
> | > | > JVM heating there was 10 minute warmup on 80k shared keys, then
> | > | > the tests were executed in the order from the table below. TCP
> | > | > was
> | > | > used as JGroups stack base.
> | > | >
> | > | > The variants below use pessimistic transactions (one request
> | > | > per
> | > | > transaction), or no transactions in 6.1.0 case (running w/o
> | > | > transactions on JDG 6.0.1 with high contention wouldn't make
> | > | > any
> | > | > sense). The last 'disjunct' case has slightly different key
> | > | > format
> | > | > evading any contention. Before the slash is number of reads per
> | > | > node (sum for all 10 threads) per second, the latter is number
> | > | > of
> | > | > writes.
> | > | >
> | > | > Accessed keys | JDG 6.0.1 TX | JDG 6.1.0 TX | JDG 6.1.0 NO TX
> | > | > 80 | 18824/2866 | 21671/3542 | 22264/5978
> | > | > 800 | 18740/3625 | 23655/4971 | 20772/6018
> | > | > 8000 | 18694/3583 | 21086/4478 | 19882/5748
> | > | > 24000 | 18674/3493 | 19342/4078 | 19757/5630
> | > | > 80000 | 18680/3459 | 18567/3799 | 22617/6416
> | > | > 80k disjunct | 19023/3670 | 20941/4527 | 20877/6154
> | > | >
> | > | > I can't much explain why the disjunct sets of keys have so much
> | > | > better performance than the low contention cases, the key
> | > | > format
> | > | > is really just key_(number) for shared keys and
> | > | > key_(node)_(thread)_(number) for disjunct ones (and the rest of
> | > | > the code paths is the same). The exceptionally good performance
> | > | > for 80k keys in non-tx case is also very strange, but I keep
> | > | > getting these results consistently.
> | > | > Nevertheless, I am really happy about the high performance
> | > | > increase
> | > | > between 6.0.1 and 6.1.0 and that no-tx case works intuitively
> | > | > (no
> | > | > deadlocks) and really fast :)
> | > | >
> | > | > Radim
> | > | >
> | > | > -----------------------------------------------------------
> | > | > Radim Vansa
> | > | > Quality Assurance Engineer
> | > | > JBoss Datagrid
> | > | > tel. +420532294559 ext. 62559
> | > | >
> | > | > Red Hat Czech, s.r.o.
> | > | > Brno, Purkyňova 99/71, PSČ 612 45
> | > | > Czech Republic
> | > | >
> | > | >
> | > |
> | >
> | > _______________________________________________
> | > infinispan-dev mailing list
> | > infinispan-dev at lists.jboss.org
> | > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> |
> | _______________________________________________
> | infinispan-dev mailing list
> | infinispan-dev at lists.jboss.org
> | https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
More information about the infinispan-dev
mailing list