[infinispan-dev] transactions :: incremental locking

Mon Jul 11 07:27:20 EDT 2011

2011/7/11 Mircea Markus <mircea.markus at jboss.com>:
>
> On 6 Jul 2011, at 16:41, Sanne Grinovero wrote:
>
>> Very nice design, congratulations.
>> Second picture is missing a final "4" step returning the ACK to N1 right?
> yes, this happens when 1a, 1b and 1c returned to N1.
>>
>> Abot the third proposal I was thinking about a possible improvement,
>> not sure if it works:
>> consider the list defining the keys being locked, already ordered. As
>> far as I understood about the ordering on the hashweel, you can now
>> split this list to have the groups which need to be sent to each node:
>> k1, k2, k3, k4, k5 -> N2{k1, k2} N3{k3, k4} N4{k5}
>> i.e. the sequence does not need to be reordered and we can make use of that.
>>
>> Now instead of sending the full list in multicast to each involved
>> node, we send the list only to the first node, and wait for an ACK
>> from the last node involved in the sequence.
>> This does not reduce the number of async rpcs being executed, but
>> sends less packets on the wire, and has the advantage of never needing
>> to go back to recover from a failed node or recover from a potential
>> deadlock, as all keys are always taken in proper order, not
>> "optimistically".
> Your suggestion is a variation of 2nd approach("Async incremental locking"). If we approach it this way, the prepare message, which is potentially big, is sent in sequence from one node to the other.
> With current approach the prepare is send in parallel, and the lock request sent from N2->N3->N4 has a smaller payload.

Yes it's a variation, and I wouldn't have thought of it without
reading your nice proposal. Still I'm not understanding why the
original suggestion is doing anything in parallel, since you suggest
the payload must be kept low, you're sending multiple different keys
over the wire from N1, so you can't use multicast and assuming you
have a single wire these are sent in sequence too. About total number
of parallel communications, the original approach has the same number
of serialized "node hops" for the best scenario, but worse in case
something needs to be rolled back, imho it gets complicated if you
have to rollback a longer path when more nodes are involved!
so my intention was to a) make it simpler b) while still optimising
for the good case, having less chattiness in the worst case.

I agree my last proposal sends keys over to nodes which are not
interesting to them, still since it's not about the values I wouldn't
expect this to be a big change in the packet size, and am wondering if
this "token ring" approach would not be simpler: there are less
packets going around at the same time, order is guaranteed, so I
wouldn't be sure of this being less efficient on the network without
some real world test.

Cheers,
Sanne

>>
>> [reusing the same picture of "Hybrid incremental approach"]
>> assuming this list of keys, owned by these nodes: k1, k2, k3, k4, k5
>> -> N2{k1, k2} N3{k3, k4} N4{k5}
>> Now if N2 receives the list, it acquires k1, k2, in order, and sends
>> the full list over to N3 with a message like
>>
>> N2{k1 *taken, k2 *taken} N3{k3, k4} N4{k5}
>>
>> and it will also wait for an async ACK. If it receives an ACK, N2's
>> work is done.
>> If it doesn't receive one in a reasonable time, it will assume
> This failure detection functionality is in jgroups, I wouldn't want to have it duplicated here as well. But indeed N2 can be notified by jgroups about this.
>> N3 is
>> dead: now the list of nodes changes, but the order in which these
>> elements need to be acquired did not. In fact, some of these might be
>> moved to N2 himself, some to N4 since the rehashing will move them to
>> neighbours: N3 owned keys can be taken right away (again not violating
>> the order), the rest of the list is again forwarded to the next alive
>> node.
>> The last node will return an ACK to the originating node to inform
>> that the prepare phase was successful, and when he did it will send an
>> ACK back to it's previous node to free it's resources, this one will
>> propagate back the ACK, etc.
>> No deadlocks, no timeouts, no undo operations. what am I missing?
>
>>
>> Cheers,
>> Sanne
>>
>> 2011/7/6 Mircea Markus <mircea.markus at jboss.com>:
>>>
>>> On 6 Jul 2011, at 09:00, Paolo Romano wrote:
>>>
>>>> On 7/5/11 3:47 PM, Mircea Markus wrote:
>>>>> On 5 Jul 2011, at 15:39, Manik Surtani wrote:
>>>>>
>>>>>> Good stuff, shame about the RPC count .  ;)
>>>>> yeah. Still very valid when there are deadlocks, guess figures will tell us more precisely what the gain is
>>>>>
>>>> I agree with Manik's observation.
>>>>
>>>> When benchmarking it, it would be interesting to compare it also with
>>>> the total order multicast approach that Pedro has been developing.
>>> +1
>>>>
>>>> When do you plan to have these locking optimizations available?
>>> We hope to have it in for 5.1, but it has a lower priority compared with the other lock related changes.
>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>