Adding inifinispan-dev, as other people might be interested in this as well.
On 26 Feb 2013, at 10:57, Pedro Ruivo wrote:
hi,
I found the blocking problem with the state transfer this morning. It happens because of
the reordering of a regular and OOB message.
Below, is a simplification of what is happening for two nodes
A: total order broadcasts rebalance_start
B: (incoming thread) delivers rebalance_start
B: has no segments to request so the rebalance is done
B: sends async request with rebalance_confirm (unicast #x)
B: sends the rebalance_start response (unicast #x+1) (the response is a regular message)
A: receives rebalance_start response (unicast #x+1)
A: in UNICAST2, it detects the message is out-of-order and blocks the response in the
sender window (i.e. the message #x is missing)
A: receives the rebalance_confirm (unicast #x)
A: delivers rebalance_confirm. Infinispan blocks this command until all the
rebalance_start responses are received ==> this originates a deadlock! (because the
response is blocked in unicast layer)
wondering why does this happen: the "rebalance_start response" (regular unicast
#x+1) should not wait for the rebalance_confirm (OOB message) as there's no ordering
between them, but be passed up the stack by jgroups.
Question: can the request's response message be sent always as OOB? (I think the
answer should be no...)
My suggestion: when I deliver a rebalance_confirm command (that it is send async), can I
move it to a thread in async_thread_pool_executor?
Weird thing: last night I tried more than 5x time in a row with UNICAST3 and it never
blocks. can this meaning a problem with UNICAST3 or I had just lucky?
Any other suggestion?
Cheers,
Pedro
Cheers,
--
Mircea Markus
Infinispan lead (
www.infinispan.org)