[infinispan-issues] [JBoss JIRA] (ISPN-5309) Model data consistency for get()/put() operations in Infinispan

Wed Mar 18 11:05:19 EDT 2015

    [ https://issues.jboss.org/browse/ISPN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051416#comment-13051416 ] 

Richard Achmatowicz edited comment on ISPN-5309 at 3/18/15 11:05 AM:
---------------------------------------------------------------------

For the record:

{quote}
> Hi Dan
>
> After reviewing your document Consistency Guarantees in Infinispan, I agree the starting point would be to validate get() and put() data consistency semantics in the absence of membership changes on the simplest of caches: no data store, no transactions, no passivation nor eviction. These are all refinements of the more basic model.
>
> I have a few comments/clarifications on the following issues:
> - the expected data consistency for interleavings of gets and puts
> - use of cache topology and CH in replicated mode and distributed mode
> - some more detail on the processing of get and put operations in the context of a set of infinispan cache instances in distributed mode
> - message order assumptions
>
>
> data consistency
> ---------------
> In order to describe input sequences of get/put operations and the expected output values they return after processing by Infinispan,
> I assume the following:
>
> - operations to be considered are ordered (or not) by the happens-before relation based on causality (as opposed to wall clock time)
> - initial data consistency model for get and put operations:
>    - get(): operation that happens-after a get operation get(k) -> v sees the same value v (or the value of a later write)
>    - put(): operation that happens-after a write operation put(k=v) will see  the updated value v (or the value of a later write)
>
> NB: this rules out any required data consistency between operations initiated at different nodes, even though one may occur  after the other based on wall clock time
This sounds good.

Because a single client can connect to multiple nodes, in the future we should also model consistency for operations that happen on different but do not overlap (the first operation completely finishes before the other operation starts).

> replicated mode vs distributed mode
> -----------------------------------
> I'm a little unclear on the way in which replicated mode and distributed mode
> make use of cache topology information (which we can assume
> has been correctly distributed to all nodes):
>
> - do both of these modes make use the current cache topology (containing the CH) to lookup nodes corresponding to keys they need to access?
Yes

> - is the only difference in the way that the CH is set up (so that for cluster size N, there is one primary owner and N-1 backup owners)?
There is one more difference: replicated mode always uses broadcasts instead of unicasts. It should only be relevant when the JGroups view changes, though.

> - I assume only one CH is used in regular mode (as opposed to rehashing mode)
Yes, we only have 2 CHs during state transfer.

> distributed mode get()/put() design
> -----------------------------------
> Here is a summary of the way in which get and puts are processed:
>
> (taken almost verbatim from section 4 of Consistency Guarantees in Infinispan, reorganized a little)
>
> get():
>    - when an originator performs a get(K):
>      - if originator == owner, return the local value
>      - if originator != owner, originator sends a remote GET_REQ command to
>      all its owners in read-CH and uses the first successful non-null
>      response (GET_RSP)
Correct. We do a retry when the topology id changes, but we can ignore that for now.

> put:()
>    - when an originator performs a put(K,V):
>      - if originator == primary owner
>        - primary owner acquires the key lock, and replicates (PUT_REPL_REQ)
>        the write to the backup owner(s)
>        - backup owners confirm the write to the primary owner (PUT_REPL_RSP)
>        - primary owner updates the value locally, unlocks the key lock, and
>        replies to the originator
>      - if originator != primary owner
>        - originator will send a remote PUT_REQ to the primary owner of the
>        modified key (looked up from which CH?)
>        - primary owner acquires the key lock, and replicates (PUT_REPL_REQ)
>        the write to the backup owner(s)
>        - backup owners confirm the write to the primary owner (PUT_REPL_RSP)
>        - primary owner updates the value locally, unlocks the key lock, and
>        replies to the originator (PUT_RSP)
Correct.

> Are there any corrections to these descriptions that you would need to make?
> Do you have the actual message types for these messages passed back and
> forth?
Remote get uses ClusteredGetCommand for replication, and the response is a SuccessfulResponse with the entry value (in some cases with an InternalCacheEntry, but we can ignore that for now).
Remote put uses PutKeyValueCommand, both for the originator to talk to the primary and for the primary to talk to the backup owners. The responses are again SuccessfulResponses, containing the previous value.

> messaging assumptions
> ---------------------
> The processing of get()/put() makes use of unicast and RPC to communicate
> between nodes.
We automatically use broadcasts instead of unicasts when a message is being sent to all the nodes in the JGroups view. So we always use broadcasts in replicated mode, and mostly unicasts in distributed mode.

> What assumptions are there on message order for:
>    - the unicasts from originator to primary
>    - the RPCs from primary owner to backup owners
In a DIST_SYNC/REPL_SYNC cache, all messages marked as OOB, so we don't assume any ordering.

We do assume that messages are ordered in DIST_ASYNC/REPL_ASYNC mode. Let's stick with synchronous caches, though 
{quote}

was (Author: rachmato):
For the record:

<quote>
> Hi Dan
>
> After reviewing your document Consistency Guarantees in Infinispan, I agree the starting point would be to validate get() and put() data consistency semantics in the absence of membership changes on the simplest of caches: no data store, no transactions, no passivation nor eviction. These are all refinements of the more basic model.
>
> I have a few comments/clarifications on the following issues:
> - the expected data consistency for interleavings of gets and puts
> - use of cache topology and CH in replicated mode and distributed mode
> - some more detail on the processing of get and put operations in the context of a set of infinispan cache instances in distributed mode
> - message order assumptions
>
>
> data consistency
> ---------------
> In order to describe input sequences of get/put operations and the expected output values they return after processing by Infinispan,
> I assume the following:
>
> - operations to be considered are ordered (or not) by the happens-before relation based on causality (as opposed to wall clock time)
> - initial data consistency model for get and put operations:
>    - get(): operation that happens-after a get operation get(k) -> v sees the same value v (or the value of a later write)
>    - put(): operation that happens-after a write operation put(k=v) will see  the updated value v (or the value of a later write)
>
> NB: this rules out any required data consistency between operations initiated at different nodes, even though one may occur  after the other based on wall clock time
This sounds good.

Because a single client can connect to multiple nodes, in the future we should also model consistency for operations that happen on different but do not overlap (the first operation completely finishes before the other operation starts).

> replicated mode vs distributed mode
> -----------------------------------
> I'm a little unclear on the way in which replicated mode and distributed mode
> make use of cache topology information (which we can assume
> has been correctly distributed to all nodes):
>
> - do both of these modes make use the current cache topology (containing the CH) to lookup nodes corresponding to keys they need to access?
Yes

> - is the only difference in the way that the CH is set up (so that for cluster size N, there is one primary owner and N-1 backup owners)?
There is one more difference: replicated mode always uses broadcasts instead of unicasts. It should only be relevant when the JGroups view changes, though.

> - I assume only one CH is used in regular mode (as opposed to rehashing mode)
Yes, we only have 2 CHs during state transfer.

> distributed mode get()/put() design
> -----------------------------------
> Here is a summary of the way in which get and puts are processed:
>
> (taken almost verbatim from section 4 of Consistency Guarantees in Infinispan, reorganized a little)
>
> get():
>    - when an originator performs a get(K):
>      - if originator == owner, return the local value
>      - if originator != owner, originator sends a remote GET_REQ command to
>      all its owners in read-CH and uses the first successful non-null
>      response (GET_RSP)
Correct. We do a retry when the topology id changes, but we can ignore that for now.

> put:()
>    - when an originator performs a put(K,V):
>      - if originator == primary owner
>        - primary owner acquires the key lock, and replicates (PUT_REPL_REQ)
>        the write to the backup owner(s)
>        - backup owners confirm the write to the primary owner (PUT_REPL_RSP)
>        - primary owner updates the value locally, unlocks the key lock, and
>        replies to the originator
>      - if originator != primary owner
>        - originator will send a remote PUT_REQ to the primary owner of the
>        modified key (looked up from which CH?)
>        - primary owner acquires the key lock, and replicates (PUT_REPL_REQ)
>        the write to the backup owner(s)
>        - backup owners confirm the write to the primary owner (PUT_REPL_RSP)
>        - primary owner updates the value locally, unlocks the key lock, and
>        replies to the originator (PUT_RSP)
Correct.

> Are there any corrections to these descriptions that you would need to make?
> Do you have the actual message types for these messages passed back and
> forth?
Remote get uses ClusteredGetCommand for replication, and the response is a SuccessfulResponse with the entry value (in some cases with an InternalCacheEntry, but we can ignore that for now).
Remote put uses PutKeyValueCommand, both for the originator to talk to the primary and for the primary to talk to the backup owners. The responses are again SuccessfulResponses, containing the previous value.

> messaging assumptions
> ---------------------
> The processing of get()/put() makes use of unicast and RPC to communicate
> between nodes.
We automatically use broadcasts instead of unicasts when a message is being sent to all the nodes in the JGroups view. So we always use broadcasts in replicated mode, and mostly unicasts in distributed mode.

> What assumptions are there on message order for:
>    - the unicasts from originator to primary
>    - the RPCs from primary owner to backup owners
In a DIST_SYNC/REPL_SYNC cache, all messages marked as OOB, so we don't assume any ordering.

We do assume that messages are ordered in DIST_ASYNC/REPL_ASYNC mode. Let's stick with synchronous caches, though 
<quote>

> Model data consistency for get()/put() operations in Infinispan
> ---------------------------------------------------------------
>
>                 Key: ISPN-5309
>                 URL: https://issues.jboss.org/browse/ISPN-5309
>             Project: Infinispan
>          Issue Type: Task
>            Reporter: Richard Achmatowicz
>            Assignee: Richard Achmatowicz
>
> This will be the first in a series of modelling/validation exercises for some of the more critical Infinispan protocols. 
> We shall use TLA+ / PlusCal / TLA+ Tools to do the following:
> - model the design of processing of get()/put() operations in an Infinispan cluster
> - model client interactions with that cluster
> - describe the data consistency requirements of get()/put() operations
> - verify that the data consistency semantics of Infinispan are preserved in the face of concurrent client interactions
> TLA+ / PLusCal can be thought of as a pseudo-code language which has a well-defined semantics and is testable using the tools in the TLA+ Toolkit. 
> The benefits of such an exercise are that we end up with:
> - a specification of data consistency guarantees that Infinispan provides
> - a semantically precise pseudo-code description of the design of get()/put() processing
> - a verification that the protocol design is correct 
> We start here with the simple case of modelling data consistency in the absence of failures. In later exercises, we aim to tackle rebalancing and non-blocking state transfer in the face of membership changes and partitions.

--
This message was sent by Atlassian JIRA
(v6.3.11#6341)