Thanks Dan,
here are some comments / questions:
"2. Each cache member receives the ownership information from the
coordinator and starts rejecting all commands started in the old cache view"
How do you know a command was started in the old cache view; does this
mean you're shipping a cache view ID with every request ?
"2.1. Commands with the new cache view id will be blocked until we have
installed the new CH and we have received lock information from the
previous owners"
Doesn't this make this design *blocking* again ? Or do you queue
requests with the new view-ID, return immediately and apply them when
the new view-id is installed ? If the latter is the case, what do you
return ? An OK (how do you know the request will apply OK) ?
"A merge can be coalesced with multiple state transfers, one running in
each partition. So in the general case a coalesced state transfer
contain a tree of cache view changes."
Hmm, this can make a state transfer message quite large. Are we trimming
the modification list ? E.g if we have 10 PUTs on K, 1 removal of K, and
another 4 PUTs, do we just send the *last* PUT, or do we send a
modification list of 15 ?
"Get commands can write to the data container as well when L1 is
enabled, so we can't block just write commands."
Another downside of the L1 cache being part of the regular cache. IMO it
would be much better to separate the 2, as I wrote in previous emails
yesterday.
"The new owners will start receiving commands before they have received
all the state
* In order to handle those commands, the new owners will have to
get the values from the old owners
* We know that the owners in the later views have a newer version
of the entry (if they have it at all). So we need to go back on the
cache views tree and ask all the nodes on one level at the same time -
if we don't get any certain anwer we go to the next level and repeat."
How does a member C know that it *will* receive any state at all ? E.g.
if we had key K on A and B, and now B crashed, then A would push a copy
of K to C.
So when C receives a request R from another member E, but hasn't
received a state transfer from A yet, how does it know whether to apply
or queue R ? Does C wait until it get an END-OF-ST message from the
coordinator ?
(skipping the rest due to exhaustion by complexity :-))
Over the last couple of days, I've exchanged a couple of emails with the
Cloud-TM guys, and I'm more and more convinced that their total order
solution is the simpler approach to (1) transactional updates and (2)
state transfer. They don't have a solution for (2) yet, but I believe
this can be done as another totally ordered transaction, applied in the
correct location within the update stream. Or, we could possibly use a
flush: as we don't need to wait for pending TXs to complete and release
their locks, this should be quite fast.
So my 5 cents:
#1 We should focus on the total order approach, and get rid of the 2PC
and locking business for transactional updates
#2 Really focus on the eventual consistency approach
Thoughts ?
--
Bela Ban, JGroups lead (
http://www.jgroups.org)