Hi Mircea,
sorry for the long email, which I forward also to the cloudtm's mailing
list.
I believe it is first necessary to come to an agreement on the semantics
of "write-skew". :)
In his mail, Sergio refers to the "write skew" term that is used, in
literature, to refer to an anomaly that can arise with snapshot
isolation (SI), and which has become "famous" as it opened a dispute on
the definition of the SQL SERIALIZABLE isolation level: SI is
SERIALIZABLE according to SQL's criteria, but, due to the write skew
anomaly, SI is not serializable in sense of equivalence to a sequential
execution (as a more rigourous formal definition would demand, and which
is ultimately what the users may expect when using SQL's SERIALIZABLE
isolation level).
In addition to the papers pointed out by Sergio, a briefer and more
focused discussion on SI and write skew can also be found here:
http://en.wikipedia.org/wiki/Snapshot_isolation
In the context of Infinispan, I believe that it should first be
clarified that it is not implementing SI, but rather a weaker (and
consequently more efficiently implementable) consistency level. The key
difference with respect to SI is that the data container is storing a
single version of each data item, and that it does not guarantee that
the reads of a transaction come from the same "snapshot". For instance,
assume X=0, Y=0 initially, and the schedule:
T1 r(x=0), T2 w(x=1), T2 w(y=1), T2 commit, T2 r(Y=1), T2 commit
in which T2 reads X=0 and Y=1, which is impossible in any sequential
execution of T1 and T2. This is an anomaly (equivalent to the ANSI's
phantom reads, if we supersede on the fact that the latter is defined
using SQL queries) that a SI implementation would avoid.
The point is that, since Infinispan does not ensure SI, it is confusing
IMO to speak about write-skew (at least referring to the classic
SI-related definition of write-skew). In fact, I have always been
thinking of the Infinispan's "write-skew" check as an extra consistency
predicate that is implementable in a simple and efficient way, and
which happens to clash with the "conventional" SI-related write-skew
anomaly (although having totally different semantics).
Now, addressing the clash in the terminology with "classic" write-skew
would probably be a good idea, as this may be confusing for some users:
indeed, when speaking with colleagues in the academy this issue had
already emerged more than once, but I thought not to annoy you not to
sound too pedantic. :-)
However, aiming at addressing "classic" SI-related write-skew anomaly,
without first having SI would lead to a somewhat confusing consistency
semantics. IMO, a more sensible approach would be to first implement SI
in a scalable and efficient fashion, and to then add on top of it the
"cherry" of "classic" write-skew check, in order to ultimately offer
serializability.
This brings to the problem that implementing SI/MVCC in distributed
settings in a scalable/efficient fashion is far from being trivial.
Existing solutions (or straightforward extensions of non-distributed
SI/MVCC algorithms), in fact, establish "timestamps" on the newly
committed data item versions in non-scalable ways, i.e. either by:
1- relying on centralized centralized component, such as a single
"sequencing" node (which may be in its turn replicated for high
availability), or by
2- involving all the nodes in the systems in the commit phase (this is
what is sometimes called a "non genuine" partial replication solution).
As you know, in Cloud-TM we have recently proposed what is, to the best
of our knowledge, the first MVCC algorithm that overcomes the above
issues by relying on a novel vector clock based (genuine) distributed
timestamping mechanism: this solution is the GMU algorithm, that we have
discussed in the London & Lisbon meetings, and that has been presented a
couple of days ago at ICDCS (also in attachment).
In our paper we describe GMU as a protocol aiming at ensuring
serializability (more formally extended update serializability, which is
technically incomparable with serializability (I can go in detail over
this in a separate mail), but it would be trivial to adapt it to ensure
SI: basically simply avoiding locking&validating the transaction's
readset. Then, it could be straightforward to add a configuration option
called "write-skew" check, which would actually filter out the
"classic"
write-skew anomaly and ensure serializability (by enabling the
locking&validation of the transactions' readset).
Summing up, my proposal is:
- keep on offering the current write-skew check semantics on top of the
current "repeatable-read" data container (but I would change the name of
this check to avoid clashing with SI's write-skew anomaly). IMO, in
fact, the current consistency semantics represent a good trade-off
between performance and consistency: it guarantees repeatable read in a
lightweight fashion, and, as an extra, it allows to filter out
additional anomalies that are easy to catch in an efficient way (e.g.
not requiring exchanging transaction's read-sets through the network).
- for stronger consistency criteria, I propose to use a scheme like GMU
in order to offer, as default, Snapshot Isolation, and, as an optional
configuration option, the possibility to filter "classic" write-skew
anomalies (thus achieving serializability). Note that GMU has already
been implemented on top of 5.0 and is being currently re-based on 5.2.0.
So, this would be a good moment to discuss the possibility of merging it
into master.
This strategy would minimize code changes, as we would have two parallel
implementations of the various components that need to be altered in
order to guarantee stronger consistency semantics (at least lock
manager, data container, and distribution interceptor). Also, it would
allow users who are happy with the consistency semantics currently
offered by Infinispan to stick with it without having to incur in any
additional costs.
Cheers,
Paolo
On 6/22/12 3:47 PM, Mircea Markus wrote:
Hi guys,
I've come around a JIRA[1] about optimistic locking and write skew check, and
wondering what's the expected behavior in the following scenario:
tx1.begin()
read(a);
// tx2 modifies 'a' and commits changes
write(b);
tx1.commit() - shall I get a write skew (version changed) exception here as a was
modified between read and commit?
In other words, shall I also perform write skew check for the entries that were read
during a tx? We currently don't, but I can totally see situations in which this would
be needed: if the decision to write to 'b' is made on the value of 'a' and
there's an invariant containing both a and b.
[
1]https://issues.jboss.org/browse/ISPN-1841
Cheers
Mircea
--
Mircea Markus
twitter.com/mirceamarkus
Sr. Software Engineer, Infinispan
http://www.infinispan.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev