[infinispan-dev] Blocking issue in TO State Transfer

Pedro Ruivo pedro at infinispan.org
Tue Feb 26 11:14:51 EST 2013


So, in this case, the regular message will block until the OOB message 
is delivered. however, the OOB message is being block in the application 
until the regular message is delivered. And there is no way to pick the 
regular message from the window list while the OOB is blocked, right? 
(assuming no more incoming messages)

so, if everybody agrees, if I move the OOB message to another thread, 
everything should work fine...

On 02/26/2013 03:50 PM, Bela Ban wrote:
>
> On 2/26/13 4:15 PM, Dan Berindei wrote:
>>
>>
>> On Tue, Feb 26, 2013 at 12:57 PM, Pedro Ruivo <pedro at infinispan.org 
>> <mailto:pedro at infinispan.org>> wrote:
>>
>>     hi,
>>
>>     I found the blocking problem with the state transfer this morning.
>>     It happens because of the reordering of a regular and OOB message.
>>
>>     Below, is a simplification of what is happening for two nodes
>>
>>     A: total order broadcasts rebalance_start
>>
>>     B: (incoming thread) delivers rebalance_start
>>     B: has no segments to request so the rebalance is done
>>     B: sends async request with rebalance_confirm (unicast #x)
>>     B: sends the rebalance_start response (unicast #x+1) (the response
>>     is a regular message)
>>
>>     A: receives rebalance_start response (unicast #x+1)
>>     A: in UNICAST2, it detects the message is out-of-order and blocks
>>     the response in the sender window (i.e. the message #x is missing)
>>     A: receives the rebalance_confirm (unicast #x)
>>     A: delivers rebalance_confirm. Infinispan blocks this command
>>     until all the rebalance_start responses are received ==> this
>>     originates a deadlock! (because the response is blocked in unicast
>>     layer)
>>
>>     Question: can the request's response message be sent always as
>>     OOB? (I think the answer should be no...)
>>
>>
>> We could, if Bela adds the send(Message) method to the Response 
>> interface...
>
> I created a JIRA yesterday: https://issues.jboss.org/browse/JGRP-1602. 
> I'm wondering though if you *really* need it, as making all responses 
> OOB is a bad idea IMO, see below...
>
>
>> and personally I think it would be better to make all responses OOB 
>> (as in JGroups 3.2.x). I don't have any data to back this up, though...
>
> Intuitively, I think indiscriminatingly marking all responses as OOB 
> is bad, especially in the light of the async invocation API which will 
> make all messages non-blocking, at least in the OOB or reg thread pools.
>
> The code in 3.3 *does* actually copy the flags of the request into the 
> response, so if the request is async (OOB), so will the response be. 
> For async RPCs (regular messages), you're not getting any response 
> anyway, so no worries here...
>
>
>>
>>     My suggestion: when I deliver a rebalance_confirm command (that it
>>     is send async), can I move it to a thread in
>>     async_thread_pool_executor?
>>
>>
>> I have WIP fix for https://issues.jboss.org/browse/ISPN-2825, which 
>> should stop blocking the REBALANCE_CONFIRM commands on the 
>> coordinator: https://github.com/danberindei/infinispan/tree/t_2825_m
>>
>> I haven't issued a PR yet because I'm still getting a failure in 
>> ClusterTopologyManagerTest, I think because of a JGroups issue (RSVP 
>> not receiving an ACK from itself). I'll let you know when I find out...
>
>
>
> Yes, please do that. I saw in London that you could reproduce it in 
> your test, so it should be simple to find the root cause.
>
>
>
>>
>>     Weird thing: last night I tried more than 5x time in a row with
>>     UNICAST3 and it never blocks. can this meaning a problem with
>>     UNICAST3 or I had just lucky?
>>
>>
>> Even though the REBALANCE_CONFIRM command is sent async, the message 
>> is still OOB. I think UNICAST/2/3 should not block any regular 
>> message waiting for the processing of an OOB message, as long as that 
>> message was received, so maybe the problem is in UNICAST2?
>
> If the OOB thread added the OOB message, then it will simply pass it 
> up. However, the regular thread needs to wait for gaps in the receiver 
> table to fill, as it doesn't know what type of message will be 
> received (could be regular).
>
> As soon as the OOB message has been added to the table, the regular 
> message will get delivered
>



More information about the infinispan-dev mailing list