[
https://issues.jboss.org/browse/ISPN-5309?page=com.atlassian.jira.plugin....
]
Richard Achmatowicz commented on ISPN-5309:
-------------------------------------------
For the record:
Hi Dan
After reviewing your document Consistency Guarantees in Infinispan, I agree the starting
point would be to validate get() and put() data consistency semantics in the absence of
membership changes on the simplest of caches: no data store, no transactions, no
passivation nor eviction. These are all refinements of the more basic model.
I have a few comments/clarifications on the following issues:
- the expected data consistency for interleavings of gets and puts
- use of cache topology and CH in replicated mode and distributed mode
- some more detail on the processing of get and put operations in the context
of a set of infinispan cache instances in distributed mode
- message order assumptions
data consistency
---------------
In order to describe input sequences of get/put operations and the expected output values
they return after processing by Infinispan,
I assume the following:
- operations to be considered are ordered (or not) by the happens-before
relation based on causality (as opposed to wall clock time)
- initial data consistency model for get and put operations:
- get(): operation that happens-after a get operation get(k) -> v sees the same
value v (or the value of a later write)
- put(): operation that happens-after a write operation put(k=v) will see the updated
value v (or the value of a later write)
NB: this rules out any required data consistency between operations initiated at
different nodes, even though one may occur after the other based on wall clock time
This sounds good.
Because a single client can connect to multiple nodes, in the future we should also model
consistency for operations that happen on different but do not overlap (the first
operation completely finishes before the other operation starts).
replicated mode vs distributed mode
-----------------------------------
I'm a little unclear on the way in which replicated mode and distributed mode
make use of cache topology information (which we can assume
has been correctly distributed to all nodes):
- do both of these modes make use the current cache topology (containing the
CH) to lookup nodes corresponding to keys they need to access?
Yes
- is the only difference in the way that the CH is set up (so that
for
cluster size N, there is one primary owner and N-1 backup owners)?
There is one
more difference: replicated mode always uses broadcasts instead of unicasts. It should
only be relevant when the JGroups view changes, though.
- I assume only one CH is used in regular mode (as opposed to
rehashing mode)
Yes, we only have 2 CHs during state transfer.
distributed mode get()/put() design
-----------------------------------
Here is a summary of the way in which get and puts are processed:
(taken almost verbatim from section 4 of Consistency Guarantees in Infinispan,
reorganized a little)
get():
- when an originator performs a get(K):
- if originator == owner, return the local value
- if originator != owner, originator sends a remote GET_REQ command to
all its owners in read-CH and uses the first successful non-null
response (GET_RSP)
Correct. We do a retry when the topology id changes, but we
can ignore that for now.
put:()
- when an originator performs a put(K,V):
- if originator == primary owner
- primary owner acquires the key lock, and replicates (PUT_REPL_REQ)
the write to the backup owner(s)
- backup owners confirm the write to the primary owner (PUT_REPL_RSP)
- primary owner updates the value locally, unlocks the key lock, and
replies to the originator
- if originator != primary owner
- originator will send a remote PUT_REQ to the primary owner of the
modified key (looked up from which CH?)
- primary owner acquires the key lock, and replicates (PUT_REPL_REQ)
the write to the backup owner(s)
- backup owners confirm the write to the primary owner (PUT_REPL_RSP)
- primary owner updates the value locally, unlocks the key lock, and
replies to the originator (PUT_RSP)
Correct.
Are there any corrections to these descriptions that you would need
to make?
Do you have the actual message types for these messages passed back and
forth?
Remote get uses ClusteredGetCommand for replication, and the response is a
SuccessfulResponse with the entry value (in some cases with an InternalCacheEntry, but we
can ignore that for now).
Remote put uses PutKeyValueCommand, both for the originator to talk to the primary and for
the primary to talk to the backup owners. The responses are again SuccessfulResponses,
containing the previous value.
messaging assumptions
---------------------
The processing of get()/put() makes use of unicast and RPC to communicate
between nodes.
We automatically use broadcasts instead of unicasts when a message
is being sent to all the nodes in the JGroups view. So we always use broadcasts in
replicated mode, and mostly unicasts in distributed mode.
What assumptions are there on message order for:
- the unicasts from originator to primary
- the RPCs from primary owner to backup owners
In a DIST_SYNC/REPL_SYNC cache,
all messages marked as OOB, so we don't assume any ordering.
We do assume that messages are ordered in DIST_ASYNC/REPL_ASYNC mode. Let's stick with
synchronous caches, though
Model data consistency for get()/put() operations in Infinispan
---------------------------------------------------------------
Key: ISPN-5309
URL:
https://issues.jboss.org/browse/ISPN-5309
Project: Infinispan
Issue Type: Task
Reporter: Richard Achmatowicz
Assignee: Richard Achmatowicz
This will be the first in a series of modelling/validation exercises for some of the more
critical Infinispan protocols.
We shall use TLA+ / PlusCal / TLA+ Tools to do the following:
- model the design of processing of get()/put() operations in an Infinispan cluster
- model client interactions with that cluster
- describe the data consistency requirements of get()/put() operations
- verify that the data consistency semantics of Infinispan are preserved in the face of
concurrent client interactions
TLA+ / PLusCal can be thought of as a pseudo-code language which has a well-defined
semantics and is testable using the tools in the TLA+ Toolkit.
The benefits of such an exercise are that we end up with:
- a specification of data consistency guarantees that Infinispan provides
- a semantically precise pseudo-code description of the design of get()/put() processing
- a verification that the protocol design is correct
We start here with the simple case of modelling data consistency in the absence of
failures. In later exercises, we aim to tackle rebalancing and non-blocking state transfer
in the face of membership changes and partitions.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)