Hi Dan,
Very good write-up, here are some notes I had whilst reading.
State transfer flow
- "If a node leaves soon after joining, we are going to send data to a dead node. If
numOwners=2 and we have 2 nodes rapidly joining and leaving, not interrupting the state
transfer when they leave would mean we finish the state transfer and throw that data away
on the old owners."
- so if a new joiner crashes, an existing node that pushes state to any node will a)
stop transferring state and b) ack to the coordinator that PREPARE_VIEW failed. Then the
next thing to happen would be that the coordinator sends a new PREPARE_VIEW to all
remaining cluster member?
- the image is incomplete:
https://community.jboss.org/servlet/JiveServlet/showImage/102-17638-16-18...
"If key k was modified on node B while v2 was being installed, however, only node B
would send the value to the new owner in v3: the put command on B would have invalidated k
on A"
- this should also be considered for get - when asks for a key on the new owner in V3, if
the key is missing, it should redirect the get to the owner in V2. If the owner in V2 has
it in its local cache it should return it. If it doesn't have it it should ask the
owner in V1 for it. This chain of requests would go down to the last successfully
installed view.
Cache entries
"and remove (or move to L1) all the entries that are no longer local."
- if entries are moved in L1, with the smart L1 invalidation the new owner should be
made aware that some entries were stored in other cache's L1
- "That means we don't have to keep track of when a particular entry has been
modified - we can just do a putIfAbsent, assuming that whatever value we have locally on
the new owners is the right one."
- I don't understand this phrase. In which situation would we do a putIfAbsent and
what would this replace? A put?
Locks and pending transactions
- "it treats all the keys that any transaction attempted to lock as already locked
(pottentially by multiple transactions at the same time)" - not *all* of the keys but
only the ones for which there is a transaction that prepared in a previous view and
contains them.
Recovery:
Q1: no it doesn't and it shouldn't. I'll create a JIRA for this.
Q2: yes. On restart the TM of the crashed node will identify the TX as in-doubt and the TM
would be able to forward its state after that
"The in-doubt transactions' keys should not be accessible to new transactions, so
when a new node becomes the primary owner for a key it needs to retrieve any in-doubt
transactions from the other owners at the beginning of state transfer. (If it was already
a backup owner for that key it should have the in-doubt transactions already, but if it
just joined it won't have any transaction information.)"
-or what we can do is to migrate, as part of the state transfer, the in-doubt transactions
to the nodes where they would actually be relevant from a lock-owning perspective.
Async commit phase - in progress:
"we just block anyone from starting state transfer until all the (non-OOB) commit
commands with the old view id have been delivered"
- aren't all the commit commands being sent OOB?
Another approach would be: the receiver of the async commit, if the commit would require
a retry, do the retry itself (instead of the tx originator) on the new owner.
Cheers,
Mircea
On 28 May 2012, at 15:59, Dan Berindei wrote:
Hi guys
I published a new version of the non-blocking state transfer design
document here:
https://community.jboss.org/wiki/Non-blockingStateTransfer
This time I focused a bit on the "other" cache configuration (other
than dist-sync, that is). This revealed some new problems, these are
the most interesting ones:
* With async-commit phase or async-1pc, we can get stale locks on
joiners. I have an Infinispan-only approach that should work, but it
may be easier to just require FLUSH.
* Async-commit + recovery enabled: a node sends a commit command to 2
owners, one node loses the message, and then the originator dies
before retransmitting. How do we let the TM know that the transaction
is in-doubt, since to the TM it looks like it has been committed?
Please try to read the document and let me know if anything sounds too
opaque or doesn't make sense. I think it's good enough for starting
the implementation, although I've removed the actual implementation
details from the document for now.
Cheers
Dan
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev