Interesting stuff Manik, thanks for the updates. Actually, on our side
we've also been working on adding versioning to ISPN during the summer.
However, in our case we are aiming at achieving serializability avoiding
global synchronization points (so we're actually keeping chains of
versions per each key, not just the most recent one).
Maria and Pedro next week will give an overview of the solution we're
working on at the Lisbon meeting. If you've already some versions of the
code, may be we could already start looking at it, in order to avoid
taking decisions that will make it excessively painful to merge the code
in future.
About the two options you're mentioning. I'd go definitely for 2).
Concerning costs. For option 2), the prepare message should piggyback
the version identifiers of *each* data item that needs to be write-skew
checked...which may lead to big messages, if you needed to test a lot of
data items. But the ws-check is done only on the data items that are
both read and written within the same xact. So I'd expect that normally
just a few keys would need to be write-skew checked (at least this would
be the case for the wide majority of DBMS/STM benchmarks I've been using
so far). Therefore I would not be too concerned with this issue.
Cheers
Paolo
On 9/14/11 3:03 PM, Manik Surtani wrote:
So I've been hacking on versioned entries for a bit now, and want
to run the designs by everyone. Adding an EntryVersion to each entry is easy, making this
optional and null by default easy too, and a SimpleVersion a wrapper around a long and a
PartitionTolerantVersion being a vector clock implementation. Also easy stuff, changing
the entry hierarchy and the marshalling to ensure versions - if available - are shipped,
etc.
Comparing versions would happen in Mircea's optimistic locking code, on prepare, when
a write skew check is done. If running in a non-clustered environment, the simple
object-identity check we currently have is enough; otherwise an EntryVersion.compare()
will need to happen, with one of 4 possible results: equal, newer than, older than, or
concurrently modified. The last one can only happen if you have a
PartitionTolerantVersion, and will indicate a split brain and simultaneous update.
Now the hard part. Who increments the version? We have a few options, all seem
expensive.
1) The modifying node. If the modifying node is a data owner, then easy. Otherwise the
modifying node *has* to do a remote GET first (or at least a GET_VERSION) before doing a
PUT. Extra RPC per entry. Sucks.
2) The data owner. This would have to happen on the primary data owner only, and the
primary data owner would need to perform the write skew check. NOT the modifying node.
The modifying node would also need to increment and ship its own NodeClock along with the
modification. Extra info to ship per commit.
I'm guessing we go with #2, but would like to hear your thoughts.
Cheers
Manik
--
Manik Surtani
manik(a)jboss.org
twitter.com/maniksurtani
Lead, Infinispan
http://www.infinispan.org
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev