New subject: Optimizing tx for DIST from user perspective

Monday, 9 July 2012

On 06/07/2012 17:12, Vladimir Blagojevic wrote:
...
 Mircea,

 Privately to you as I am not sure this makes sense for wider
 distribution - yet. Adding infinispan-dev to this as I think it's interesting
for a wider 
audience.
...

 Say user has bunch of keys/values to insert into cache. He could do it
 one key/value at the time, all in one tx or in tx batches. If he wants
 to do it in batches of transactions then it would make sense to group
 keys by the primary Address assigned on hashwheel. Very interesting point. Besides
the locking, grouping keys has another 
significant advantage : during the prepare phase each node receives the 
complete list of modifications in that transaction and not only the 
modification pertaining to it.
E.g. say we have the following key->node mapping:
k1 -> A
k2 -> B
k3 -> C
Where k1, k2 and k3 are keys; A, B and C are nodes.
If Tx1 writes (k1,k2,k3) then during the prepare A,B and C will receive 
the the same package containing all the modification - namely (k1, 
k2,k3). There are several reasons for doing this (apparently) 
unoptimized approach: serialize the prepare only once, better handling 
of recovery information.

Now if you group transactions/batches base on key distribution, as you 
suggested, the amount of redundant traffic is significantly reduced - 
and that translates in better performance especially when the datasets 
you're inserting is quite high.
...
 Therefore each tx batch would lock keys only on primary node and
 nowhere else - call it tx node pinning if you want! Now imagine a
 cluster with bunch of concurrent txs initiated from all nodes. If I am
 not mistaken this tx pinning algorithm would not only increase
 throughput but also minimize deadlocks. yes. With optimistic tx caches, the only
possibility for deadlocks is 
between transactions touching multiple nodes[1]. As long as your 
transactions only write to the same node, even if they do it on the same 
key-set, the possibility of deadlock is (almost[2]) zero.

...
 Does this make sense? If so, why not support it somehow on API level
 or do we already? ;-) We don't have a service like this for now. I think your
best option is 
to fetch the CH from the advanced cache 
(cache.getAdvancedCache().getDistributionManager().getConistentHahs()) 
and use it to group the inserts.

Thinking about it, might be worth having an blog entry describing this 
as it can really boost performance when you need to load an initial 
large set of data in Infinispan.

...
 Regards,
 Vladimir 

[1] this DLD situation will be fixed once we have incremental locking in 
place: https://issues.jboss.org/browse/ISPN-1219
[2]  we use key's CH value to induce an order over the keys written in a 
transaction - that's in order to avoid deadlocks. If there are 
collisions between these values then there's still a chance for deadlock.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Optimizing tx for DIST from user perspective