[infinispan-dev] Optimizing tx for DIST from user perspective

Tue Jul 10 07:02:36 EDT 2012

On Jul 9, 2012, at 9:31 AM, Mircea Markus wrote:

> On 06/07/2012 17:12, Vladimir Blagojevic wrote:
>> Mircea,
>> 
>> Privately to you as I am not sure this makes sense for wider
>> distribution - yet.
> Adding infinispan-dev to this as I think it's interesting for a wider 
> audience.
>> 
>> Say user has bunch of keys/values to insert into cache. He could do it
>> one key/value at the time, all in one tx or in tx batches. If he wants
>> to do it in batches of transactions then it would make sense to group
>> keys by the primary Address assigned on hashwheel.
> Very interesting point. Besides the locking, grouping keys has another 
> significant advantage : during the prepare phase each node receives the 
> complete list of modifications in that transaction and not only the 
> modification pertaining to it.
> E.g. say we have the following key->node mapping:
> k1 -> A
> k2 -> B
> k3 -> C
> Where k1, k2 and k3 are keys; A, B and C are nodes.
> If Tx1 writes (k1,k2,k3) then during the prepare A,B and C will receive 
> the the same package containing all the modification - namely (k1, 
> k2,k3). There are several reasons for doing this (apparently) 
> unoptimized approach: serialize the prepare only once, better handling 
> of recovery information.
> 
> Now if you group transactions/batches base on key distribution, as you 
> suggested, the amount of redundant traffic is significantly reduced - 
> and that translates in better performance especially when the datasets 
> you're inserting is quite high.
>> Therefore each tx batch would lock keys only on primary node and
>> nowhere else - call it tx node pinning if you want! Now imagine a
>> cluster with bunch of concurrent txs initiated from all nodes. If I am
>> not mistaken this tx pinning algorithm would not only increase
>> throughput but also minimize deadlocks.
> yes. With optimistic tx caches, the only possibility for deadlocks is 
> between transactions touching multiple nodes[1]. As long as your 
> transactions only write to the same node, even if they do it on the same 
> key-set, the possibility of deadlock is (almost[2]) zero.
> 
> 
>> Does this make sense? If so, why not support it somehow on API level
>> or do we already? ;-)
> We don't have a service like this for now. I think your best option is 
> to fetch the CH from the advanced cache 
> (cache.getAdvancedCache().getDistributionManager().getConistentHahs()) 
> and use it to group the inserts.
> 
> Thinking about it, might be worth having an blog entry describing this 
> as it can really boost performance when you need to load an initial 
> large set of data in Infinispan.

Before blogging, it would it be good do some testing to measure the performance boost with some sampled data?

The advantage of doing this, apart from proving that the boost is indeed present, you can show the code pattern that users need to use when storing a bunch of key/values which is proven to work.

Cheers,

> 
>> Regards,
>> Vladimir
> 
> 
> [1] this DLD situation will be fixed once we have incremental locking in 
> place: https://issues.jboss.org/browse/ISPN-1219
> [2]  we use key's CH value to induce an order over the keys written in a 
> transaction - that's in order to avoid deadlocks. If there are 
> collisions between these values then there's still a chance for deadlock.
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache