[infinispan-dev] IRC meeting

Dan Berindei dan.berindei at gmail.com
Wed May 2 08:54:07 EDT 2012


On Wed, May 2, 2012 at 1:34 PM, Pedro Ruivo <pruivo at gsd.inesc-id.pt> wrote:
> hi,
>
> On 5/2/12 11:29 AM, Dan Berindei wrote:
>> Hi guys
>>
>> We're getting closer...
>>
>> On Wed, May 2, 2012 at 10:53 AM, Pedro Ruivo<pruivo at gsd.inesc-id.pt>  wrote:
>>> Hi Dan,
>>>
>>> comment inline :)
>>>
>>> Cheers,
>>> Pedro
>>>
>>> On 5/2/12 8:36 AM, Dan Berindei wrote:
>>>> Hi Paolo
>>>>
>>>> On Tue, May 1, 2012 at 8:13 PM, Paolo Romano<romano at inesc-id.pt>    wrote:
>>>>> Hi Dan,
>>>>>
>>>>> the easiest way to me seems to treat the state transfer as a special
>>>>> transaction that is TO-broadcast using the sequencer, as you have also
>>>>> been suggesting in your email.
>>>>>
>>>>> I guess that this way you may even get rid of the ST lock, as
>>>>> transactions that request a commit after a ST is started will be
>>>>> TO-delivered after the "ST transaction", which will:
>>>>>       a) start transfering state only after having waited for the
>>>>> completion of the txs TO-delivered before the ST transaction, and
>>>>>       b) prevent the thread in charge of managing TO-delivered
>>>>> transactions from processing transactions that are TO-delivered after
>>>>> the ST transaction, until the ST transaction is finished on that node.
>>>>>
>>>>> Let me try to clarify this by outlining a possible protocol:
>>>>> A) the coordinator TO-broadcasts the PREPARE_VIEW
>>>>> B) when this message is TO-delivered at a node n, the thread (say thread
>>>>> t) that is responsible of managing incoming TO messages, enqueues itself
>>>>> on the count-down latches associated with the transactions that are
>>>>> being appied by the TO thread pool. This ensures that every node starts
>>>>> transferring state only after having applied all the updates of the
>>>>> previously TO delivered xacts.
>>>>> C) The state transfer is activated at node n by the same thread t
>>>>> responsible of processing incoming TO messages. This guarantees that no
>>>>> updates will be performed while the state is being transferred
>>>> This part isn't very clear to me - aren't the tx latches necessary
>>>> exactly because there may be more than one thread processing incoming
>>>> TO messages?
>>> The tx latches is used when write skew is enabled and ispn is executing
>>> distributed transaction (via XA Resource). But, only one thread is
>>> delivering the message, and only in TotalOrderInterceptor, it puts the
>>> transaction in a thread pool.
>>>
>> Yeah, obviously you're both right... Mircea explained this to me some
>> time ago but I completely forgot about it.
>>
>>> For example:
>>>
>>> thread T: delivers PrepareCommand with Tx invoking handle() in the
>>> CommandAwareRPCDispatcher.
>>> T: when Tx arrives to TotalOrderIntercerptor, it uses the latches to
>>> ensure the deliver order, and then puts Tx in the thread pool
>>> another thread T': start processing tx
>>> T: returns from handle() and picks the next transaction
>>>
>>> What Paolo is suggesting is:
>>> 1) sending PrepareView through Sequencer, in synchronous mode
>>>
>>> thread T: delivers PrepareView invoking handle()
>>> T: PrepareView, asks for the latches of previous transactions
>>> T: PrepareView waits in all of them //this will block thread T, so no
>>> transactions are deliver in this phase
>>> T: the same thread, executes the state transfer, pushing the state //the
>>> Apply State messages should be sent with OOB flag if possible
>>> T: when finish, it returns from handle() //jgroups will pick the return
>>> value and send it back to coordinator as a response. Paolo named ACK_VIEW
>> Ok, I thought you were using the in-VM communication between the user
>> thread and the TO thread because of some limitation of SEQUENCER, but
>> I realize now it was just an optimization to avoid receiving responses
>> from all the nodes. I think I can change JGroupsTransport to not mark
>> sync messages as OOB when totalOrder is also set to true (without
>> breaking anything else), so this should indeed work without any
>> changes to state transfer itself.
>>
> Agree :)
>>> T: pick the next transaction
>>>
>>> When JGroups receives all the responses, it will unblock the synchronous
>>> remote invocation and then the coordinator sends the Commit/Rollback View
>>>
>>> In my opinion, this should work.
>>>
>> I think this will properly block prepare commands, but not necessarily
>> commit/rollback commands for txs that have already been prepared -
>> since they could be OOB.
>>
>> Am I missing something again?
>>
> yes that the point. We don't want to block the commit or rollback,
> otherwise, the previous delivered prepare (transaction) will not finish.
> And we want them to finish before starting sending the state.
>
> Right?
>

Yeah, I saw that once I looked closer at the code of
ParallelTotalOrderManager. I guess I'm still thinking in terms of
non-TO transactions, I was expecting the commit command to write
something to the data container :)

This means we don't really have an option about the state transfer
lock - we can't acquire it for commits on the originator, like we do
without TO, or the remote prepares already in progress would never
finish. I guess I got lucky, because I never saw this happening in the
test suite :)

One question I haven't been able to figure out is about the unpaired
prepares and commits that arrive to a joiner before the PREPARE_VIEW
command. (Unpaired because the joiner didn't receive the corresponding
commit/prepare.) I think this could even happen for commit commands
that arrive after PREPARE_VIEW. What happens to those?

Off-topic, in non-TO synchronous caches we have a problem when
queueing is enabled in the OOB thread pool: "active" prepare commands
waiting for queued commit commands, leading to a deadlock. I was
thinking you may have the same problem with your thread pool, but it
looks like you avoid the issue because the commits use JGroups'
regular/OOB thread pool instead of the TO one. I wonder if we could
make our 2PC code use separate 2 thread pools for prepare and commit
as well...

Cheers
Dan



More information about the infinispan-dev mailing list