[infinispan-dev] Eventual consistency

Manik Surtani msurtani at redhat.com
Thu Mar 3 10:25:02 EST 2011


The way it works right now is if a rehash is in progress, C returns with an UnsureResponse, which prompts X to wait for further responses from either A or B and use that response instead.  X sends the GET out to all of A, B and C in parallel anyway.

On 3 Mar 2011, at 14:23, Sanne Grinovero wrote:

> 2011/3/3 Manik Surtani <msurtani at redhat.com>:
>> A GET may produce incorrect results if a node not involved in a rehash, Node X, asks for a key. E.g., it may ask Node C for the entry since Node C is a new owner. However Node C may not have completed applying state received so would return a null. In normal circumstances, this would be considered a valid response but if X is aware that a rehash is going on it would wait for more responses (from A and B).
> 
> I'd say that because C is aware that it's performing a rehash, and
> it's aware he still doesn't know the correct value for the key it is
> being requested, it should not return null but wait to know the proper
> answer and return that - ideally it could give some hints to the
> ongoing state transfer to prioritize this specific key as there is
> immediate demand for it.
> In any case, C is going to download this value soon as it's now an
> owner of it, so this doesn't look like to me that this approach would
> augment the network traffic.
> 
> Otherwise, how could a client requesting a key know if the returned
> value is null or not? Should I deal with it in the application?
> 
>> 
>> As for node failure before an async rpc completes, this would result in data loss.
>> 
>> Sent from my mobile phone
>> 
>> On 2 Mar 2011, at 18:38, Sanne Grinovero <sanne.grinovero at gmail.com> wrote:
>> 
>>> Hi Manik,
>>> can you explain the first cause, why is it that during a rehash you're
>>> unable to get an answer to a GET?
>>> If a node A having installed the view T'' is receiving a GET request
>>> from B which is still having the outdated view T', while he's not the
>>> owner any more as the new view changed to T'' and so he just
>>> transferred the requested value to a node C,
>>> he definitely knows how to handle the request by forwarding it to C: B
>>> was the owner before - otherwise it wouldn't receive the request, and
>>> because it isn't any more he must be aware of the new hash
>>> configuration.
>>> He still stays in the middle of communication, and then sends the
>>> requested value to A along with enough information about the new view
>>> to avoid more erroneous requests.
>>> 
>>> About your proposal, what would happen if the owner crashes before he
>>> has async-written the changes to a secondary node?
>>> 
>>> Cheers,
>>> Sanne
>>> 
>>> 2011/3/2 Manik Surtani <manik at jboss.org>:
>>>> As consistency models go, Infinispan is primarily strongly consistent (with
>>>> 2-phase commit between data owners), with the exception of during a rehash
>>>> where because of eventual consistency (inability to get a valid response to
>>>> a remote GET) forces us to wait for more responses, a quorum if you like.
>>>>  Not dissimilar to PAXOS [1] in some ways.
>>>> I'm wondering whether, for the sake of performance, we should also offer a
>>>> fully eventually consistent model?  What I am thinking is that changes
>>>> *always* occur only on the primary data owner.  Single phase, no additional
>>>> round trips, etc.  The primary owner then asynchronously propagates changes
>>>> to the other data owners.  This would mean things run much faster in a
>>>> stable cluster, and durability is maintained.  However, during rehashes when
>>>> keys are moved, the notion of the primary owner may change.  So to deal with
>>>> this, we could use vector clocks [2] to version each entry.  Vector clocks
>>>> allow us to "merge" state nicely in most cases, and in the case of reads,
>>>> we'd flip back to a PAXOS style quorum during a rehash to get the most
>>>> "correct" version.
>>>> In terms of implementation, almost all of this would only affect the
>>>> DistributionInterceptor and the DistributionManager, so we could easily have
>>>> eventually consistent flavours of these two components.
>>>> Thoughts?
>>>> Cheers
>>>> Manik
>>>> [1] http://en.wikipedia.org/wiki/Paxos_algorithm
>>>> [2] http://en.wikipedia.org/wiki/Vector_clock
>>>> --
>>>> Manik Surtani
>>>> manik at jboss.org
>>>> twitter.com/maniksurtani
>>>> Lead, Infinispan
>>>> http://www.infinispan.org
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>> 
>> 

--
Manik Surtani
manik at jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org






More information about the infinispan-dev mailing list