[infinispan-dev] IRC meeting

Wed May 2 03:53:37 EDT 2012

Hi Dan,

comment inline :)

Cheers,
Pedro

On 5/2/12 8:36 AM, Dan Berindei wrote:
> Hi Paolo
>
> On Tue, May 1, 2012 at 8:13 PM, Paolo Romano<romano at inesc-id.pt>  wrote:
>> Hi Dan,
>>
>> the easiest way to me seems to treat the state transfer as a special
>> transaction that is TO-broadcast using the sequencer, as you have also
>> been suggesting in your email.
>>
>> I guess that this way you may even get rid of the ST lock, as
>> transactions that request a commit after a ST is started will be
>> TO-delivered after the "ST transaction", which will:
>>      a) start transfering state only after having waited for the
>> completion of the txs TO-delivered before the ST transaction, and
>>      b) prevent the thread in charge of managing TO-delivered
>> transactions from processing transactions that are TO-delivered after
>> the ST transaction, until the ST transaction is finished on that node.
>>
>> Let me try to clarify this by outlining a possible protocol:
>> A) the coordinator TO-broadcasts the PREPARE_VIEW
>> B) when this message is TO-delivered at a node n, the thread (say thread
>> t) that is responsible of managing incoming TO messages, enqueues itself
>> on the count-down latches associated with the transactions that are
>> being appied by the TO thread pool. This ensures that every node starts
>> transferring state only after having applied all the updates of the
>> previously TO delivered xacts.
>> C) The state transfer is activated at node n by the same thread t
>> responsible of processing incoming TO messages. This guarantees that no
>> updates will be performed while the state is being transferred
> This part isn't very clear to me - aren't the tx latches necessary
> exactly because there may be more than one thread processing incoming
> TO messages?
The tx latches is used when write skew is enabled and ispn is executing 
distributed transaction (via XA Resource). But, only one thread is 
delivering the message, and only in TotalOrderInterceptor, it puts the 
transaction in a thread pool.

For example:

thread T: delivers PrepareCommand with Tx invoking handle() in the 
CommandAwareRPCDispatcher.
T: when Tx arrives to TotalOrderIntercerptor, it uses the latches to 
ensure the deliver order, and then puts Tx in the thread pool
another thread T': start processing tx
T: returns from handle() and picks the next transaction

What Paolo is suggesting is:
1) sending PrepareView through Sequencer, in synchronous mode

thread T: delivers PrepareView invoking handle()
T: PrepareView, asks for the latches of previous transactions
T: PrepareView waits in all of them //this will block thread T, so no 
transactions are deliver in this phase
T: the same thread, executes the state transfer, pushing the state //the 
Apply State messages should be sent with OOB flag if possible
T: when finish, it returns from handle() //jgroups will pick the return 
value and send it back to coordinator as a response. Paolo named ACK_VIEW
T: pick the next transaction

When JGroups receives all the responses, it will unblock the synchronous 
remote invocation and then the coordinator sends the Commit/Rollback View

In my opinion, this should work.

>> D) An ACK_VIEW is sent to the coordinator.
>>
> This is the part that I didn't like about this (otherwise very cool)
> idea. Making the PREPARE_VIEW commands asynchronous + adding a new
> ACK_VIEW command would mean making the cache view installation process
> on the coordinator a multithreaded affair, and changing a little too
> much code in that region (considering NBST touches that area as well).
>
>> Notice that:
>> - in parallel application level threads may request to commit new
>> transactions, and TO-broadcast their requests, which will be however
>> enqueued by JGroups, as the thread "t" will not process them until it
>> has finalized ST.
> Same as above, I think we'd need a tx latch for the ST transaction as
> well, and all txs will need to enqueue on that latch.
>
>> - if you wanted to squeeze in an extra-optimization, after sending the
>> ACK_VIEW you may already let thread t conclude the ST transaction (after
>> having released any count-down latch that it may have acquired in phase
>> B above), without waiting for a COMMIT_VIEW. The rationale is that node
>> has finished transferring its portion of state, and may already start
>> processing the transactions that have been totally ordered after the ST
>> request. I admit however not to have carefully thought about the case of
>> view changes being cancelled, as I don't know exactly if/why/how you are
>> doing this.
>>
> When a node leaves the cluster/cache during state transfer, it means
> it may not have sent all the state that it was responsible for to the
> joiners. Since we have no way to verify if it actually did send all
> the state, we just assume that it didn't and we cancel the state
> transfer/cache view installation.
>
> If we did not have a COMMIT_VIEW command, some nodes would notice the
> node leaving and they would cancel the cache view installation, while
> some nodes would not notice the leaver and install the new cache view
> anyway. E.g. on a join, the joiner may not notice the leaver and may
> start handling transactions, even though it did not receive all the
> state (and may never receive all the state). If a second node joins,
> we need all the nodes to have the same cache view installed in order
> to determine who is responsible for sending each "chunk" of state to
> the joiner - if we didn't have the same cache view on all nodes then
> some chunks will have 2 "pushing owners", and some chunks will have
> none.
>
>
> Cheers
> Dan
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev