[infinispan-dev] Versioned entries - overview of design, looking for comments

Paolo Romano romano at inesc-id.pt
Thu Sep 15 09:44:17 EDT 2011


Interesting stuff Manik, thanks for the updates. Actually, on our side 
we've also been working on adding versioning to ISPN during the summer.  
However, in our case we are aiming at achieving serializability avoiding 
global synchronization points (so we're actually keeping chains of 
versions per each key, not just the most recent one).

Maria and Pedro next week will give an overview of the solution we're 
working on at the Lisbon meeting. If you've already some versions of the 
code, may be we could already start looking at it, in order to avoid 
taking decisions that will make it excessively painful to merge the code 
in future.

About the two options you're mentioning. I'd go definitely for 2).

Concerning costs. For option 2), the prepare message should piggyback 
the version identifiers of *each* data item that needs to be write-skew 
checked...which may lead to big messages, if you needed to test a lot of 
data items. But the ws-check is done only on the data items that are 
both read and written within the same xact. So I'd expect that normally 
just a few keys would need to be write-skew checked (at least this would 
be the case for the wide majority of DBMS/STM benchmarks I've been using 
so far).  Therefore I would not be too concerned with this issue.

Cheers

     Paolo

On 9/14/11 3:03 PM, Manik Surtani wrote:
> So I've been hacking on versioned entries for a bit now, and want to run the designs by everyone. Adding an EntryVersion to each entry is easy, making this optional and null by default easy too, and a SimpleVersion a wrapper around a long and a PartitionTolerantVersion being a vector clock implementation.  Also easy stuff, changing the entry hierarchy and the marshalling to ensure versions - if available - are shipped, etc.
>
> Comparing versions would happen in Mircea's optimistic locking code, on prepare, when a write skew check is done.  If running in a non-clustered environment, the simple object-identity check we currently have is enough; otherwise an EntryVersion.compare() will need to happen, with one of 4 possible results: equal, newer than, older than, or concurrently modified.  The last one can only happen if you have a PartitionTolerantVersion, and will indicate a split brain and simultaneous update.
>
> Now the hard part.  Who increments the version?  We have a few options, all seem expensive.
>
> 1) The modifying node.  If the modifying node is a data owner, then easy.  Otherwise the modifying node *has* to do a remote GET first (or at least a GET_VERSION) before doing a PUT.  Extra RPC per entry.  Sucks.
>
> 2) The data owner.  This would have to happen on the primary data owner only, and the primary data owner would need to perform the write skew check.  NOT the modifying node.  The modifying node would also need to increment and ship its own NodeClock along with the modification. Extra info to ship per commit.
>
> I'm guessing we go with #2, but would like to hear your thoughts.
>
> Cheers
> Manik
>
> --
> Manik Surtani
> manik at jboss.org
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev



More information about the infinispan-dev mailing list