<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Feb 27, 2013 at 11:13 AM, Bela Ban <span dir="ltr"><<a href="mailto:bban@redhat.com" target="_blank">bban@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">OK, here's what happens:<br>
<br>
- A's receiver table forB is at #6, this means that next message from B<br>
must be #7<br>
- A receives B#8 (regular message from B)<br>
- A adds B#8 to B's receiver table, but doesn't deliver it (not OOB,and<br>
not #7)<br>
- A receives OOB message B#7 from B<br>
- The OOB thread delivers B#7 immediately<br>
- Infinispan blocks on B#7<br>
- Unless another message from B is received, B#8 will *not* get<br>
delivered: as you can see in the codebelow, the OOB thread would check<br>
*after* delivering B#7 if there are more messages to be delivered, but<br>
because it is blocked by Infinispan, it cannot deliver B#8.<br>
<br>
This is one of the rare cases where an OOB thread gets to deliver<br>
regular messages.<br>
<br>
The root cause is that Infinispan blocks on an OOB message; but OOB<br>
messages should never block! This is another reason why an Infinispan<br>
application thread pool makes a lot of sense !<br>
<br></blockquote><div><br></div><div>I wonder who first added sync mode and locking in JBossCache ;)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
// An OOB message is passed up immediately. Later, when remove() is<br>
called, we discard it. This affects ordering !<br>
// <a href="http://jira.jboss.com/jira/browse/JGRP-377" target="_blank">http://jira.jboss.com/jira/browse/JGRP-377</a><br>
if(msg.isFlagSet(Message.OOB) && added) {<br>
try {<br>
up_prot.up(evt);<br>
}<br>
catch(Throwable t) {<br>
log.error("couldn't deliver OOB message " + msg, t);<br>
}<br>
}<br>
<br>
//The OOB thread never gets here as it is blocked in<br>
up_prot.up()by Infinispan.<br>
<br>
final AtomicBoolean processing=win.getProcessing();<br>
if(!processing.compareAndSet(false, true))<br>
return true;<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
On 2/26/13 7:35 PM, Pedro Ruivo wrote:<br>
> On 02/26/2013 04:31 PM, Bela Ban wrote:<br>
>> On 2/26/13 5:14 PM, Pedro Ruivo wrote:<br>
>>> So, in this case, the regular message will block until the OOB<br>
>>> message is delivered.<br>
>><br>
>> No, the regular message should get delivered as soon as the OOB message<br>
>> has been *received* (not *delivered*). Unless there are previous regular<br>
>> messages from the same sender which are delivered in the same thread,<br>
>> and one of them is blocked in application code...<br>
> In attachment is part of the log. I only know that the response is<br>
> disappearing between UNICAST2 and the ISPN unmarshaller.<br>
><br>
> could you please take a look?<br>
><br>
> the response is being sent and received and I don't understand why<br>
> ISPN is not receive it<br>
><br>
> Thanks<br>
> Pedro<br>
>><br>
>><br>
>>> however, the OOB message is being block in the application<br>
>>> until the regular message is delivered. And there is no way to pick the<br>
>>> regular message from the window list while the OOB is blocked, right?<br>
>>> (assuming no more incoming messages)<br>
>> This actually should happen, as they're delivered by different threads !<br>
>><br>
>><br>
>>> so, if everybody agrees, if I move the OOB message to another thread,<br>
>>> everything should work fine...<br>
>>><br>
>>> On 02/26/2013 03:50 PM, Bela Ban wrote:<br>
>>>> On 2/26/13 4:15 PM, Dan Berindei wrote:<br>
>>>>> On Tue, Feb 26, 2013 at 12:57 PM, Pedro Ruivo <<a href="mailto:pedro@infinispan.org">pedro@infinispan.org</a><br>
>>>>> <mailto:<a href="mailto:pedro@infinispan.org">pedro@infinispan.org</a>>> wrote:<br>
>>>>><br>
>>>>> hi,<br>
>>>>><br>
>>>>> I found the blocking problem with the state transfer this<br>
>>>>> morning.<br>
>>>>> It happens because of the reordering of a regular and OOB<br>
>>>>> message.<br>
>>>>><br>
>>>>> Below, is a simplification of what is happening for two nodes<br>
>>>>><br>
>>>>> A: total order broadcasts rebalance_start<br>
>>>>><br>
>>>>> B: (incoming thread) delivers rebalance_start<br>
>>>>> B: has no segments to request so the rebalance is done<br>
>>>>> B: sends async request with rebalance_confirm (unicast #x)<br>
>>>>> B: sends the rebalance_start response (unicast #x+1) (the<br>
>>>>> response<br>
>>>>> is a regular message)<br>
>>>>><br>
>>>>> A: receives rebalance_start response (unicast #x+1)<br>
>>>>> A: in UNICAST2, it detects the message is out-of-order and<br>
>>>>> blocks<br>
>>>>> the response in the sender window (i.e. the message #x is<br>
>>>>> missing)<br>
>>>>> A: receives the rebalance_confirm (unicast #x)<br>
>>>>> A: delivers rebalance_confirm. Infinispan blocks this command<br>
>>>>> until all the rebalance_start responses are received ==> this<br>
>>>>> originates a deadlock! (because the response is blocked in<br>
>>>>> unicast<br>
>>>>> layer)<br>
>>>>><br>
>>>>> Question: can the request's response message be sent always as<br>
>>>>> OOB? (I think the answer should be no...)<br>
>>>>><br>
>>>>><br>
>>>>> We could, if Bela adds the send(Message) method to the Response<br>
>>>>> interface...<br>
>>>> I created a JIRA yesterday: <a href="https://issues.jboss.org/browse/JGRP-1602" target="_blank">https://issues.jboss.org/browse/JGRP-1602</a>.<br>
>>>> I'm wondering though if you *really* need it, as making all responses<br>
>>>> OOB is a bad idea IMO, see below...<br>
>>>><br>
>>>><br>
>>>>> and personally I think it would be better to make all responses OOB<br>
>>>>> (as in JGroups 3.2.x). I don't have any data to back this up,<br>
>>>>> though...<br>
>>>> Intuitively, I think indiscriminatingly marking all responses as OOB<br>
>>>> is bad, especially in the light of the async invocation API which will<br>
>>>> make all messages non-blocking, at least in the OOB or reg thread<br>
>>>> pools.<br>
>>>><br>
>>>> The code in 3.3 *does* actually copy the flags of the request into the<br>
>>>> response, so if the request is async (OOB), so will the response be.<br>
>>>> For async RPCs (regular messages), you're not getting any response<br>
>>>> anyway, so no worries here...<br>
>>>><br>
>>>><br>
>>>>> My suggestion: when I deliver a rebalance_confirm command<br>
>>>>> (that it<br>
>>>>> is send async), can I move it to a thread in<br>
>>>>> async_thread_pool_executor?<br>
>>>>><br>
>>>>><br>
>>>>> I have WIP fix for <a href="https://issues.jboss.org/browse/ISPN-2825" target="_blank">https://issues.jboss.org/browse/ISPN-2825</a>, which<br>
>>>>> should stop blocking the REBALANCE_CONFIRM commands on the<br>
>>>>> coordinator: <a href="https://github.com/danberindei/infinispan/tree/t_2825_m" target="_blank">https://github.com/danberindei/infinispan/tree/t_2825_m</a><br>
>>>>><br>
>>>>> I haven't issued a PR yet because I'm still getting a failure in<br>
>>>>> ClusterTopologyManagerTest, I think because of a JGroups issue (RSVP<br>
>>>>> not receiving an ACK from itself). I'll let you know when I find<br>
>>>>> out...<br>
>>>><br>
>>>> Yes, please do that. I saw in London that you could reproduce it in<br>
>>>> your test, so it should be simple to find the root cause.<br>
>>>><br>
>>>><br>
>>>><br>
>>>>> Weird thing: last night I tried more than 5x time in a row with<br>
>>>>> UNICAST3 and it never blocks. can this meaning a problem with<br>
>>>>> UNICAST3 or I had just lucky?<br>
>>>>><br>
>>>>><br>
>>>>> Even though the REBALANCE_CONFIRM command is sent async, the message<br>
>>>>> is still OOB. I think UNICAST/2/3 should not block any regular<br>
>>>>> message waiting for the processing of an OOB message, as long as that<br>
>>>>> message was received, so maybe the problem is in UNICAST2?<br>
>>>> If the OOB thread added the OOB message, then it will simply pass it<br>
>>>> up. However, the regular thread needs to wait for gaps in the receiver<br>
>>>> table to fill, as it doesn't know what type of message will be<br>
>>>> received (could be regular).<br>
>>>><br>
>>>> As soon as the OOB message has been added to the table, the regular<br>
>>>> message will get delivered<br>
>>>><br>
>>> _______________________________________________<br>
>>> infinispan-dev mailing list<br>
>>> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
>>> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
><br>
><br>
><br>
> _______________________________________________<br>
> infinispan-dev mailing list<br>
> <a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
> <a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
<br>
</div></div><div class="im HOEnZb">--<br>
Bela Ban, JGroups lead (<a href="http://www.jgroups.org" target="_blank">http://www.jgroups.org</a>)<br>
<br>
</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>
infinispan-dev mailing list<br>
<a href="mailto:infinispan-dev@lists.jboss.org">infinispan-dev@lists.jboss.org</a><br>
<a href="https://lists.jboss.org/mailman/listinfo/infinispan-dev" target="_blank">https://lists.jboss.org/mailman/listinfo/infinispan-dev</a><br>
</div></div></blockquote></div><br></div></div>