On 3/10/12 5:07 PM, Bela Ban wrote:
If so, then I can assume that a transactional modification touching
a
number of keys will almost always touch *all* nodes ? Example:
- We have 10 nodes
- numOwners = 2
- If we have a good consistent hash, I can assume that I have to modifiy
5 different keys (10 / 2) on average in a TX to touch *all* nodes in the
cluster with the PREPARE/COMMIT phase, correct ?
If my last statement is correct, is it safe to assume that with DIST and
transactional modifications, I will have a lot of TX contention /
collisions ?
We have run experiments with ISPN 5.2 and TPC-C (1 warehouse, which
gives a high probability of contention among transactions), and compared
it with ISPN 5.0 (where locks were acquired on all replicas of a key,
not only on the primary).
The results running w/o write skew check and 10 nodes on our cluster
(number of owners=2) follow:
Tx/sec Abort Rate
5.2 12 15
5.0 3 30
5.0-TOM 60 0
Our understanding is that acquiring locks on a single node did reduce
contention probability/abort rate. But that if transactions update on
average even a small number of keys (TPC-C should update, with the used
configuration parameters, around 5 keys on avg) contention may still
have a big impact on performance.
If this is correct, this would IMO lay even more importance onto the
work done by the Cloud-TM team, replacing 2PC with total order.
Thanks :)
Also, if we touch almost all nodes, would it make sense to use
SEQUENCER for
*all* updates ? Would this obviliate the need for TOM (total order for
partial replication) ?
This could be done, you are right, it's what sometimes
is called
"non-genuine" partial replication. Our take on this is that this will
work good on small scale clusters, not on large ones. But on small scale
clusters, unless memory is a concern, full replication normally works
better (as all reads can be served locally)... so, we are not the
biggest fans of this approach :-)
Well, probably not, because we only want to send keys to nodes that
actually need to store them...
Yes, and this cost will likely be prohibitive with large scale clusters
(>10 nodes)
Thoughts ?
Cheers,
Pedro