[infinispan-dev] Blocking issue in TO State Transfer

Dan Berindei dan.berindei at gmail.com
Wed Feb 27 06:43:50 EST 2013


On Wed, Feb 27, 2013 at 11:13 AM, Bela Ban <bban at redhat.com> wrote:

> OK, here's what happens:
>
> - A's receiver table forB is at #6, this means that next message from B
> must be #7
> - A receives B#8 (regular message from B)
> - A adds B#8 to B's receiver table, but doesn't deliver it (not OOB,and
> not #7)
> - A receives OOB message B#7 from B
> - The OOB thread delivers B#7 immediately
> - Infinispan blocks on B#7
> - Unless another message from B is received, B#8 will *not* get
> delivered: as you can see in the codebelow, the OOB thread would check
> *after* delivering B#7 if there are more messages to be delivered, but
> because it is blocked by Infinispan, it cannot deliver B#8.
>
> This is one of the rare cases where an OOB thread gets to deliver
> regular messages.
>
> The root cause is that Infinispan blocks on an OOB message; but OOB
> messages should never block! This is another reason why an Infinispan
> application thread pool makes a lot of sense !
>
>
I wonder who first added sync mode and locking in JBossCache ;)


>
>      // An OOB message is passed up immediately. Later, when remove() is
> called, we discard it. This affects ordering !
>          // http://jira.jboss.com/jira/browse/JGRP-377
>          if(msg.isFlagSet(Message.OOB) && added) {
>              try {
>                  up_prot.up(evt);
>              }
>              catch(Throwable t) {
>                  log.error("couldn't deliver OOB message " + msg, t);
>              }
>          }
>
>          //The OOB thread never gets here as it is blocked in
> up_prot.up()by Infinispan.
>
>          final AtomicBoolean processing=win.getProcessing();
>          if(!processing.compareAndSet(false, true))
>              return true;
>
>
>
> On 2/26/13 7:35 PM, Pedro Ruivo wrote:
> > On 02/26/2013 04:31 PM, Bela Ban wrote:
> >> On 2/26/13 5:14 PM, Pedro Ruivo wrote:
> >>> So, in this case, the regular message will block until the OOB
> >>> message is delivered.
> >>
> >> No, the regular message should get delivered as soon as the OOB message
> >> has been *received* (not *delivered*). Unless there are previous regular
> >> messages from the same sender which are delivered in the same thread,
> >> and one of them is blocked in application code...
> > In attachment is part of the log. I only know that the response is
> > disappearing between UNICAST2 and the ISPN unmarshaller.
> >
> > could you please take a look?
> >
> > the response is being sent and received and I don't understand why
> > ISPN is not receive it
> >
> > Thanks
> > Pedro
> >>
> >>
> >>> however, the OOB message is being block in the application
> >>> until the regular message is delivered. And there is no way to pick the
> >>> regular message from the window list while the OOB is blocked, right?
> >>> (assuming no more incoming messages)
> >> This actually should happen, as they're delivered by different threads !
> >>
> >>
> >>> so, if everybody agrees, if I move the OOB message to another thread,
> >>> everything should work fine...
> >>>
> >>> On 02/26/2013 03:50 PM, Bela Ban wrote:
> >>>> On 2/26/13 4:15 PM, Dan Berindei wrote:
> >>>>> On Tue, Feb 26, 2013 at 12:57 PM, Pedro Ruivo <pedro at infinispan.org
> >>>>> <mailto:pedro at infinispan.org>> wrote:
> >>>>>
> >>>>>       hi,
> >>>>>
> >>>>>       I found the blocking problem with the state transfer this
> >>>>> morning.
> >>>>>       It happens because of the reordering of a regular and OOB
> >>>>> message.
> >>>>>
> >>>>>       Below, is a simplification of what is happening for two nodes
> >>>>>
> >>>>>       A: total order broadcasts rebalance_start
> >>>>>
> >>>>>       B: (incoming thread) delivers rebalance_start
> >>>>>       B: has no segments to request so the rebalance is done
> >>>>>       B: sends async request with rebalance_confirm (unicast #x)
> >>>>>       B: sends the rebalance_start response (unicast #x+1) (the
> >>>>> response
> >>>>>       is a regular message)
> >>>>>
> >>>>>       A: receives rebalance_start response (unicast #x+1)
> >>>>>       A: in UNICAST2, it detects the message is out-of-order and
> >>>>> blocks
> >>>>>       the response in the sender window (i.e. the message #x is
> >>>>> missing)
> >>>>>       A: receives the rebalance_confirm (unicast #x)
> >>>>>       A: delivers rebalance_confirm. Infinispan blocks this command
> >>>>>       until all the rebalance_start responses are received ==> this
> >>>>>       originates a deadlock! (because the response is blocked in
> >>>>> unicast
> >>>>>       layer)
> >>>>>
> >>>>>       Question: can the request's response message be sent always as
> >>>>>       OOB? (I think the answer should be no...)
> >>>>>
> >>>>>
> >>>>> We could, if Bela adds the send(Message) method to the Response
> >>>>> interface...
> >>>> I created a JIRA yesterday: https://issues.jboss.org/browse/JGRP-1602
> .
> >>>> I'm wondering though if you *really* need it, as making all responses
> >>>> OOB is a bad idea IMO, see below...
> >>>>
> >>>>
> >>>>> and personally I think it would be better to make all responses OOB
> >>>>> (as in JGroups 3.2.x). I don't have any data to back this up,
> >>>>> though...
> >>>> Intuitively, I think indiscriminatingly marking all responses as OOB
> >>>> is bad, especially in the light of the async invocation API which will
> >>>> make all messages non-blocking, at least in the OOB or reg thread
> >>>> pools.
> >>>>
> >>>> The code in 3.3 *does* actually copy the flags of the request into the
> >>>> response, so if the request is async (OOB), so will the response be.
> >>>> For async RPCs (regular messages), you're not getting any response
> >>>> anyway, so no worries here...
> >>>>
> >>>>
> >>>>>       My suggestion: when I deliver a rebalance_confirm command
> >>>>> (that it
> >>>>>       is send async), can I move it to a thread in
> >>>>>       async_thread_pool_executor?
> >>>>>
> >>>>>
> >>>>> I have WIP fix for https://issues.jboss.org/browse/ISPN-2825, which
> >>>>> should stop blocking the REBALANCE_CONFIRM commands on the
> >>>>> coordinator: https://github.com/danberindei/infinispan/tree/t_2825_m
> >>>>>
> >>>>> I haven't issued a PR yet because I'm still getting a failure in
> >>>>> ClusterTopologyManagerTest, I think because of a JGroups issue (RSVP
> >>>>> not receiving an ACK from itself). I'll let you know when I find
> >>>>> out...
> >>>>
> >>>> Yes, please do that. I saw in London that you could reproduce it in
> >>>> your test, so it should be simple to find the root cause.
> >>>>
> >>>>
> >>>>
> >>>>>       Weird thing: last night I tried more than 5x time in a row with
> >>>>>       UNICAST3 and it never blocks. can this meaning a problem with
> >>>>>       UNICAST3 or I had just lucky?
> >>>>>
> >>>>>
> >>>>> Even though the REBALANCE_CONFIRM command is sent async, the message
> >>>>> is still OOB. I think UNICAST/2/3 should not block any regular
> >>>>> message waiting for the processing of an OOB message, as long as that
> >>>>> message was received, so maybe the problem is in UNICAST2?
> >>>> If the OOB thread added the OOB message, then it will simply pass it
> >>>> up. However, the regular thread needs to wait for gaps in the receiver
> >>>> table to fill, as it doesn't know what type of message will be
> >>>> received (could be regular).
> >>>>
> >>>> As soon as the OOB message has been added to the table, the regular
> >>>> message will get delivered
> >>>>
> >>> _______________________________________________
> >>> infinispan-dev mailing list
> >>> infinispan-dev at lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> >
> > _______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20130227/d2f80013/attachment.html 


More information about the infinispan-dev mailing list