[jboss-dev-forums] [Design of JBossCache] - Re: Custom data versions

bstansberry@jboss.com do-not-reply at jboss.com
Wed Jul 16 17:14:21 EDT 2008


Jason and I had a good IM discussion of versioning issues; posting it here for the record:

anonymous wrote : 
  | (03:18:30 PM) Jason Greene: regarding this versioning discussion
  | (03:18:38 PM) Jason Greene: i have this feeling we are overlooking something
  | (03:18:53 PM) Jason Greene: i recall a very long discussion with gavin and steve about this last year
  | (03:19:34 PM) besYIM: LOL. that's why I said go slow
  | (03:19:53 PM) Jason Greene: which brings me to my question
  | (03:20:30 PM) Jason Greene: does the hibernate integration require all writers to the db to use hibernate
  | (03:20:49 PM) Jason Greene: in other words is it "option A" as it used to be called in CMP land
  | (03:21:15 PM) besYIM: i was going to answer and then you threw in option A and confused me :)
  | (03:21:17 PM) Jason Greene: because if it doesnt we need versions
  | (03:22:20 PM) Jason Greene: well by option A i mean "hibernate has exclusive write access to db"
  | (03:22:48 PM) besYIM: if data is cached and someone updates the db outside of hibernate, the cache will be incorrect until eviction flushes that data out
  | (03:23:20 PM) Jason Greene: ok because the scenario i came up with
  | (03:23:31 PM) besYIM: yeah, the PFER semantic of aborting if node exists does not include any ability to analyze the node
  | (03:23:47 PM) besYIM: that's true even w/ OPTIMISTIC
  | (03:23:48 PM) Jason Greene: is that 2 readers
  | (03:23:52 PM) Jason Greene: see different values
  | (03:24:08 PM) Jason Greene: since db writers are happening in  a different ap
  | (03:24:29 PM) Jason Greene: then its a race to which value wins
  | (03:24:39 PM) Jason Greene: and it wont necessarily be the most current
  | (03:24:53 PM) Jason Greene: if however there was a version 
  | (03:25:04 PM) Jason Greene: this would not be a problem
  | (03:26:24 PM) Jason Greene: also async replication
  | (03:26:31 PM) Jason Greene: would make versions important
  | (03:26:54 PM) Jason Greene: since two conflicting updates would arrive out of order
  | (03:27:01 PM) Jason Greene: s/would/may
  | (03:27:16 PM) besYIM: ok, let's define some terms so we're on the same page:
  | (03:27:28 PM) besYIM: insert/update --> db write, cache write
  | (03:27:36 PM) besYIM: put --> db read, cache write
  | (03:28:15 PM) besYIM: an insert/update should not go async; if it does it's a misuse, or at least means you're not concerned about consistency
  | (03:28:24 PM) Jason Greene: ok yeah
  | (03:29:14 PM) besYIM: a put can go async (if replicated).  but there its a PFER and aborts if the node already exists
  | (03:29:49 PM) besYIM: and the node would only exist w/ out-of-date data if the db was updated externally
  | (03:30:28 PM) Jason Greene: right, what about transaction commit order
  | (03:30:36 PM) Jason Greene: db commits first
  | (03:30:42 PM) Jason Greene: or cache committs first
  | (03:30:45 PM) Jason Greene: which order is it?
  | (03:30:49 PM) besYIM: db first
  | (03:31:06 PM) besYIM: wait, let me try again
  | (03:31:33 PM) besYIM: hibernate flushes to db
  | (03:31:56 PM) besYIM: beforeCompletion (prepare) phase on cache
  | (03:32:00 PM) besYIM: db commits
  | (03:32:17 PM) besYIM: afterCompletion() phase in hibernate
  | (03:32:24 PM) besYIM: afterCompletion() phase in JBC
  | (03:33:43 PM) Jason Greene: ok so back to the put
  | (03:34:41 PM) Jason Greene: put is async, so technically could happen after a insert, which is why manik was suggesting a write lock used in PFER right?
  | (03:36:01 PM) besYIM: yes, it could happen after an insert. but a PFER should abort if the node is already present, it shouldn't try to lock the node
  | (03:36:14 PM) besYIM: how MVCC handles that internally, I don't know
  | (03:38:20 PM) Jason Greene: right so its a race because server 1 does a pfer, node does not exist, and an async replication update occurs
  | (03:38:52 PM) Jason Greene: server 2 does a insert which does a sync write, this could come before the async update
  | (03:39:22 PM) besYIM: there is no such thing as an async replication update. there is an async put replication
  | (03:39:39 PM) Jason Greene: right sorry async put replication is queued
  | (03:39:49 PM) besYIM: (sorry, being anal so i understand)
  | (03:39:54 PM) Jason Greene: sync update happens first
  | (03:40:18 PM) besYIM: yeah and when the async PFER comes in, it aborts because the node already exists
  | (03:40:48 PM) Jason Greene: ah because the node it is applying to checks for existance
  | (03:41:01 PM) Jason Greene: not just the origin
  | (03:41:42 PM) besYIM: yeah.
  | (03:42:04 PM) besYIM: that's what allows async to work
  | (03:43:19 PM) Jason Greene: ok so as long as hibernate has exclusive access I cant think of a problem
  | (03:43:58 PM) besYIM: versioning could help the external-to-hibernate write scenario, but IMO only worth it if we don't introduce locking problems into PFER (during the version check)
  | (03:44:13 PM) besYIM: which, TBH would be a nice improvement
  | (03:44:22 PM) Jason Greene: well the versioning check could be a substitution for pfer
  | (03:44:30 PM) Jason Greene: essentially you are adding a precondition
  | (03:44:35 PM) Jason Greene: to every update
  | (03:44:43 PM) Jason Greene: like update blah where v = 1
  | (03:44:58 PM) Jason Greene: order is iirelevant
  | (03:45:02 PM) Jason Greene: when you do that
  | (03:45:23 PM) besYIM: but that precondition requires a read of the node, and hence a lock
  | (03:46:01 PM) besYIM: the existing PFER says "i'm willing to give up some puts that may be ok to avoid any possibility of locking issues"
  | (03:46:33 PM) Jason Greene: no locks required, because you can safely read any old version and the update would be ignored
  | (03:46:41 PM) Jason Greene: it has to be an atomic operation though
  | (03:47:46 PM) Jason Greene: a wierd psudocode expression to explain what i mean
  | (03:47:46 PM) Jason Greene: :
  | (03:48:15 PM) Jason Greene: put("/blah/1", "key", "value", "version == 3");
  | (03:48:32 PM) Jason Greene: when thats applied a WL is obtained
  | (03:48:36 PM) Jason Greene: the version is checked
  | (03:48:46 PM) Jason Greene: and only if it is valid it is applied"
  | (03:50:44 PM) besYIM: if the lock aquisition timeout is 0 ms that's probably OK
  | (03:50:55 PM) Jason Greene: well this would all be async 
  | (03:51:03 PM) Jason Greene: for puts and for insert/update
  | (03:51:16 PM) besYIM: not on the origin node.
  | (03:52:26 PM) besYIM: ok there's another benefit to versions -- a slightly better semantic for async insert/update
  | (03:52:28 PM) Jason Greene: ah right, yes, locks is aquired but its essentially atomic since everything would be async
  | (03:53:06 PM) Jason Greene: oh and
  | (03:53:09 PM) Jason Greene: its only aquired
  | (03:53:19 PM) Jason Greene: if it doesnt exist locally and the version is different
  | (03:53:26 PM) Jason Greene: so for example
  | (03:54:04 PM) Jason Greene: int cacheVersion = getVersion("/a/b");
  | (03:54:16 PM) Jason Greene: if (cacheVersion < dbVersion) put
  | (03:54:54 PM) Jason Greene: of course the goal is not to look at the db
  | (03:55:16 PM) Jason Greene: so you wouldnt do this all the time
  | (03:55:26 PM) Jason Greene: external apps would still be a problem
  | (03:56:18 PM) besYIM: yep. this versioning discussing only really helps if the origin node doesn't have it cached AND REPL_xxx is used
  | (03:56:38 PM) besYIM: if the origin node has it cached, hibernate doesn't know about the db change
  | (03:56:56 PM) Jason Greene: right so a versioning solution really just improves updates as you say
  | (03:57:12 PM) besYIM: if INVALIDATION is used the PFER doesn't generate a cluster-wide msg
  | (03:57:20 PM) Jason Greene: it doesnt solve the non-external problem
  | (03:57:24 PM) Jason Greene: erm
  | (03:57:28 PM) Jason Greene: non-exclusive hibernate
  | (03:57:58 PM) Jason Greene: youd have to have some kind of eviction policy
  | (03:58:06 PM) Jason Greene: which is what we do today right?
  | (03:59:02 PM) besYIM: yes, that's the way to handle it.  The Hibernate API also lets you flush the cache, i.e. if you had code that somehow was aware of the DB update it could tell hibernate to flush the cache
  | (04:00:39 PM) Jason Greene: it seems that you still really want sync writes
  | (04:00:50 PM) Jason Greene: because otherwise someone might see an old value
  | (04:00:54 PM) besYIM: yep
  | (04:00:55 PM) Jason Greene: although briefly
  | (04:01:20 PM) Jason Greene: alright im convinced we dont need versions
  | (04:01:23 PM) Jason Greene: :)
  | (04:01:44 PM) besYIM: the "slightly better semantic for async insert/update" can be achieved with OPTIMISTIC, which has versions
  | (04:02:04 PM) besYIM: although i guess that's going away
  | (04:02:15 PM) Jason Greene: yeah but it still runs the risk
  | (04:02:18 PM) Jason Greene: of a stale read
  | (04:02:48 PM) besYIM: yes, just avoids the problem of an older update overwriting a newer
  | (04:03:29 PM) Jason Greene: MVCC solves that with locks
  | (04:03:31 PM) Jason Greene: however
  | (04:03:43 PM) Jason Greene: there is a deadlock potential
  | (04:03:48 PM) Jason Greene: like i mentioned in the thread
  | (04:04:10 PM) Jason Greene: it would require simultaneous update/inserts
  | (04:04:22 PM) Jason Greene: which should be unlikely
  | (04:04:30 PM) Jason Greene: since the db holds a write lock 
  | (04:04:36 PM) Jason Greene: as well
  | (04:05:12 PM) besYIM: yes. it would only be a problem w/ async since the db lock can be released. And async == no deadlock in JBC
  | (04:06:44 PM) besYIM: you mind if i cut and paste the relevant part of this into the forum thread?  this was a good discussion
  | (04:06:51 PM) Jason Greene: sure go for it

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4164848#4164848

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4164848



More information about the jboss-dev-forums mailing list