[infinispan-dev] Eventual consistency
Sanne Grinovero
sanne.grinovero at gmail.com
Thu Mar 3 10:26:41 EST 2011
2011/3/3 Manik Surtani <msurtani at redhat.com>:
> The way it works right now is if a rehash is in progress, C returns with an UnsureResponse, which prompts X to wait for further responses from either A or B and use that response instead. X sends the GET out to all of A, B and C in parallel anyway.
Haa, thanks for all the explanations.
Sanne
>
> On 3 Mar 2011, at 14:23, Sanne Grinovero wrote:
>
>> 2011/3/3 Manik Surtani <msurtani at redhat.com>:
>>> A GET may produce incorrect results if a node not involved in a rehash, Node X, asks for a key. E.g., it may ask Node C for the entry since Node C is a new owner. However Node C may not have completed applying state received so would return a null. In normal circumstances, this would be considered a valid response but if X is aware that a rehash is going on it would wait for more responses (from A and B).
>>
>> I'd say that because C is aware that it's performing a rehash, and
>> it's aware he still doesn't know the correct value for the key it is
>> being requested, it should not return null but wait to know the proper
>> answer and return that - ideally it could give some hints to the
>> ongoing state transfer to prioritize this specific key as there is
>> immediate demand for it.
>> In any case, C is going to download this value soon as it's now an
>> owner of it, so this doesn't look like to me that this approach would
>> augment the network traffic.
>>
>> Otherwise, how could a client requesting a key know if the returned
>> value is null or not? Should I deal with it in the application?
>>
>>>
>>> As for node failure before an async rpc completes, this would result in data loss.
>>>
>>> Sent from my mobile phone
>>>
>>> On 2 Mar 2011, at 18:38, Sanne Grinovero <sanne.grinovero at gmail.com> wrote:
>>>
>>>> Hi Manik,
>>>> can you explain the first cause, why is it that during a rehash you're
>>>> unable to get an answer to a GET?
>>>> If a node A having installed the view T'' is receiving a GET request
>>>> from B which is still having the outdated view T', while he's not the
>>>> owner any more as the new view changed to T'' and so he just
>>>> transferred the requested value to a node C,
>>>> he definitely knows how to handle the request by forwarding it to C: B
>>>> was the owner before - otherwise it wouldn't receive the request, and
>>>> because it isn't any more he must be aware of the new hash
>>>> configuration.
>>>> He still stays in the middle of communication, and then sends the
>>>> requested value to A along with enough information about the new view
>>>> to avoid more erroneous requests.
>>>>
>>>> About your proposal, what would happen if the owner crashes before he
>>>> has async-written the changes to a secondary node?
>>>>
>>>> Cheers,
>>>> Sanne
>>>>
>>>> 2011/3/2 Manik Surtani <manik at jboss.org>:
>>>>> As consistency models go, Infinispan is primarily strongly consistent (with
>>>>> 2-phase commit between data owners), with the exception of during a rehash
>>>>> where because of eventual consistency (inability to get a valid response to
>>>>> a remote GET) forces us to wait for more responses, a quorum if you like.
>>>>> Not dissimilar to PAXOS [1] in some ways.
>>>>> I'm wondering whether, for the sake of performance, we should also offer a
>>>>> fully eventually consistent model? What I am thinking is that changes
>>>>> *always* occur only on the primary data owner. Single phase, no additional
>>>>> round trips, etc. The primary owner then asynchronously propagates changes
>>>>> to the other data owners. This would mean things run much faster in a
>>>>> stable cluster, and durability is maintained. However, during rehashes when
>>>>> keys are moved, the notion of the primary owner may change. So to deal with
>>>>> this, we could use vector clocks [2] to version each entry. Vector clocks
>>>>> allow us to "merge" state nicely in most cases, and in the case of reads,
>>>>> we'd flip back to a PAXOS style quorum during a rehash to get the most
>>>>> "correct" version.
>>>>> In terms of implementation, almost all of this would only affect the
>>>>> DistributionInterceptor and the DistributionManager, so we could easily have
>>>>> eventually consistent flavours of these two components.
>>>>> Thoughts?
>>>>> Cheers
>>>>> Manik
>>>>> [1] http://en.wikipedia.org/wiki/Paxos_algorithm
>>>>> [2] http://en.wikipedia.org/wiki/Vector_clock
>>>>> --
>>>>> Manik Surtani
>>>>> manik at jboss.org
>>>>> twitter.com/maniksurtani
>>>>> Lead, Infinispan
>>>>> http://www.infinispan.org
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev at lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>
>
> --
> Manik Surtani
> manik at jboss.org
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
More information about the infinispan-dev
mailing list