On 2/26/13 1:31 PM, Mircea Markus wrote:
Adding inifinispan-dev, as other people might be interested in this
as
well.
On 26 Feb 2013, at 10:57, Pedro Ruivo wrote:
> hi,
>
> I found the blocking problem with the state transfer this morning. It
> happens because of the reordering of a regular and OOB message.
>
> Below, is a simplification of what is happening for two nodes
>
> A: total order broadcasts rebalance_start
>
> B: (incoming thread) delivers rebalance_start
> B: has no segments to request so the rebalance is done
> B: sends async request with rebalance_confirm (unicast #x)
> B: sends the rebalance_start response (unicast #x+1) (the response is
> a regular message)
>
> A: receives rebalance_start response (unicast #x+1)
> A: in UNICAST2, it detects the message is out-of-order and blocks the
> response in the sender window (i.e. the message #x is missing)
> A: receives the rebalance_confirm (unicast #x)
> A: delivers rebalance_confirm. Infinispan blocks this command until
> all the rebalance_start responses are received ==> this originates a
> deadlock! (because the response is blocked in unicast layer)
wondering why does this happen: the "rebalance_start response"
(regular unicast #x+1) should not wait for the rebalance_confirm (OOB
message) as there's no ordering between them, but be passed up the
stack by jgroups.
No, regular messages have to wait until *all* messages with seqnos lower
than their own are delivered. OOB messages OTOH can be delivered as soon
as we've successfully added them to the receiver window.
The reason is that we use the same receiver windows for regular and OOB
messages. If we didn't, we'd have to send 2 digests instead of 1, and
this would significantly complicate the design. It would also affect
other protocols, e.g. STABLE...
--
Bela Ban, JGroups lead (
http://www.jgroups.org)