[infinispan-dev] Let me understand DIST

Thu Mar 15 10:45:47 EDT 2012

Hi,

Comment inline.

Cheers,
Pedro

On 3/15/12 8:34 AM, Bela Ban wrote:
>
> On 3/12/12 7:13 PM, Pedro Ruivo wrote:
>> On 3/10/12 5:07 PM, Bela Ban wrote:
>>> If so, then I can assume that a transactional modification touching a
>>> number of keys will almost always touch *all* nodes ? Example:
>>> - We have 10 nodes
>>> - numOwners = 2
>>> - If we have a good consistent hash, I can assume that I have to modifiy
>>> 5 different keys (10 / 2) on average in a TX to touch *all* nodes in the
>>> cluster with the PREPARE/COMMIT phase, correct ?
>>>
>>> If my last statement is correct, is it safe to assume that with DIST and
>>> transactional modifications, I will have a lot of TX contention /
>>> collisions ?
>> We have run experiments with ISPN 5.2 and TPC-C (1 warehouse, which
>> gives a high probability of contention among transactions), and compared
>> it with ISPN 5.0 (where locks were acquired on all replicas of a key,
>> not only on the primary).
>>
>> The results running w/o write skew check and 10 nodes on our cluster
>> (number of owners=2) follow:
>>
>>                    Tx/sec    Abort Rate
>> 5.2            12         15
>> 5.0            3           30
>> 5.0-TOM   60         0
>
> Excellent ! It shows that 2PC has really improved between 5.0 and 5.2...
>
> Have you run TOM on Infinispan-5.2 / JGroups 3.1 yet ? It should
> theoretically still be 60 TXs/sec. But even compared to 12, this is
> still much better !
>
>
I've not run it yet. I want to make the code stable (TOM with and 
without the write skew check) before starting the benchmark of the 
protocol. I'm still having some problems with the write skew check :P
>>>     Also, if we touch almost all nodes, would it make sense to use SEQUENCER for
>>> *all* updates ? Would this obviliate the need for TOM (total order for
>>> partial replication) ?
>> This could be done, you are right, it's what sometimes is called
>> "non-genuine" partial replication. Our take on this is that this will
>> work good on small scale clusters, not on large ones.
>
> I agree
>
>
>