[infinispan-dev] transactions :: incremental locking

Mon Jul 11 09:03:19 EDT 2011

This is getting a bit confusing; sorry, I've tried to be brief but it
seems I didn't improve the message.
I'll add some more notes below as they belong to the context, but
let's talk about this on IRC.

2011/7/11 Mircea Markus <mircea.markus at jboss.com>:
> On 11 Jul 2011, at 12:27, Sanne Grinovero wrote:
>> 2011/7/11 Mircea Markus <mircea.markus at jboss.com>:
>>> On 6 Jul 2011, at 16:41, Sanne Grinovero wrote:
>>>
>>>> Very nice design, congratulations.
>>>> Second picture is missing a final "4" step returning the ACK to N1 right?
>>> yes, this happens when 1a, 1b and 1c returned to N1.
>>>>
>>>> Abot the third proposal I was thinking about a possible improvement,
>>>> not sure if it works:
>>>> consider the list defining the keys being locked, already ordered. As
>>>> far as I understood about the ordering on the hashweel, you can now
>>>> split this list to have the groups which need to be sent to each node:
>>>> k1, k2, k3, k4, k5 -> N2{k1, k2} N3{k3, k4} N4{k5}
>>>> i.e. the sequence does not need to be reordered and we can make use of that.
>>>>
>>>> Now instead of sending the full list in multicast to each involved
>>>> node, we send the list only to the first node, and wait for an ACK
>>>> from the last node involved in the sequence.
>>>> This does not reduce the number of async rpcs being executed, but
>>>> sends less packets on the wire, and has the advantage of never needing
>>>> to go back to recover from a failed node or recover from a potential
>>>> deadlock, as all keys are always taken in proper order, not
>>>> "optimistically".
>>> Your suggestion is a variation of 2nd approach("Async incremental locking"). If we approach it this way, the prepare message, which is potentially big, is sent in sequence from one node to the other.
>>> With current approach the prepare is send in parallel, and the lock request sent from N2->N3->N4 has a smaller payload.
>>
>> Yes it's a variation, and I wouldn't have thought of it without
>> reading your nice proposal. Still I'm not understanding why the
>> original suggestion is doing anything in parallel, since you suggest
>> the payload must be kept low, you're sending multiple different keys
>> over the wire from N1, so you can't use multicast and assuming you
>> have a single wire these are sent in sequence too.
> ATM the prepare is sent as a multicast, each node receiving all the key-set touched by the transaction.
>> About total number
>> of parallel communications, the original approach has the same number
>> of serialized "node hops" for the best scenario,

> I'm a bit lost on what is the "original approach" :-) Is the "Async incremental locking" or the "hybrid one"?
> For the "hybrid", the number of serialized "node hops" in the best scenario is 1.
I'm sorry, I'm referring to "Hybrid incremental approach".
How can you have a value of 1 if the nodes have to confirm in
sequence? As the picture shows, there are messages labeled in sequence
1,2,3,4, and it should need a final ACK (5) too.
The small change I'm suggesting doesn't improve this that much, it's
only avoiding need for rolling back lock acquisitions: since you need
1-5 messages in series, I'm suggesting we take advantage of that,
admittely using a larger payload.
Thinking about it now, it's also making sure that locks are kept for a
smaller time: if all locks are acquired during the initial multicast
from message in phase 1, while with the additional proposal I've sent
some locks are acquired a bit later in terms of network messages; no
doubt the change is small, but this might make it possible from some
transactions to make progress quicker.

>> but worse in case
>> something needs to be rolled back, imho it gets complicated if you
>> have to rollback a longer path when more nodes are involved!
> yes, but the rollback should be seen as an exception in this situation, and the optimization is for the most common scenario which is lock acquisition succeeds.
>> so my intention was to a) make it simpler b) while still optimising
>> for the good case, having less chattiness in the worst case.
> The good case, at least compared with the hybrid approach, is less optimal: hybrid has serial 1 hop, this one has numNodesInvolvedInTx hops.

I'm failing to understand how you can have a single 1 hop in series,
you must ack the lock acquisition in order to achieve your goal of
deadlock prevention; there must be a misunderstanding, as I'm
experiencing the opposite interpretation; actually I think your
explanation and solution on the wiki was brilliant but we're getting
entangled on the mailing list.
Did you try expanding your schema on more nodes, what should happen if
the first node fails in a series of a dozen? It doesn't scale with the
number of involved servers, and you're keeping some locks for the time
of multiple network hops, while these locks should actually be free
and are blocking other transactions. You might well end up in a
spinning situation as the delays are in terms of multiple network
hops: you blocked another TX which needs to restart from scratch in
the locking fight, and abort yourself to retry as well: that's a lot
of wasted locking, which seem unnecessary as we can acquire them in
order together with the ACK messages.

I really think the "token ring" approach is an improvement on "Hybrid
incremental" because you can easily proof that there's no way to get
less hops considering you sequential+ACK requirement, and since we
have this nice order of the keys, it's pretty nice that we can take
advantage of that by sending the token around in the correct sequence.

Cheers,
Sanne